Instructions to use flagship-ai/francais-ghomala with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use flagship-ai/francais-ghomala with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="flagship-ai/francais-ghomala")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("flagship-ai/francais-ghomala") model = AutoModelForSeq2SeqLM.from_pretrained("flagship-ai/francais-ghomala") - Notebooks
- Google Colab
- Kaggle
français → Ghomala — French-pivot MT (fp32)
Original fp32 MarianMT translation model for Cameroonian languages, part of the Lingo / NativeAI language-preservation project (first released September 2024).
Translation runs through French as a pivot (e.g. Ghomálá' → français → Ewondo),
so each language only needs an X→français and a français→X model. The models were
trained on a corpus we compiled by hand — scripture as the aligned backbone plus
gathered books, pamphlets and other written material — normalised under the AGLC
alphabet (Alphabet Général des Langues Camerounaises). Background: the
research log.
Compressed serving bundle
For deployment we serve int8 CTranslate2
conversions (~3.8× smaller, ~6× faster on CPU), bundled at
flagship-ai/cameroon-int8.
These fp32 weights are the originals / reference.
Usage
from transformers import MarianMTModel, MarianTokenizer
model = MarianMTModel.from_pretrained("flagship-ai/francais-ghomala")
# Shared SentencePiece tokenizer ships in the int8 bundle subfolders:
tok = MarianTokenizer.from_pretrained("flagship-ai/cameroon-int8", subfolder="francais-ghomala")
ids = tok("Bonjour, comment vas-tu ?", return_tensors="pt")
print(tok.decode(model.generate(**ids)[0], skip_special_tokens=True))
Limitations
Trained on a small, formal-leaning corpus, so it is strongest on everyday/simple sentences and weaker on modern or technical vocabulary. It is an early, open baseline — an ongoing voice-data collection project aims to broaden coverage with real spoken contributions.
— Open models for Cameroonian languages · lingo.cm
- Downloads last month
- 18