français → Ghomala — French-pivot MT (fp32)

Original fp32 MarianMT translation model for Cameroonian languages, part of the Lingo / NativeAI language-preservation project (first released September 2024).

Translation runs through French as a pivot (e.g. Ghomálá' → français → Ewondo), so each language only needs an X→français and a français→X model. The models were trained on a corpus we compiled by hand — scripture as the aligned backbone plus gathered books, pamphlets and other written material — normalised under the AGLC alphabet (Alphabet Général des Langues Camerounaises). Background: the research log.

Compressed serving bundle

For deployment we serve int8 CTranslate2 conversions (~3.8× smaller, ~6× faster on CPU), bundled at flagship-ai/cameroon-int8. These fp32 weights are the originals / reference.

Usage

from transformers import MarianMTModel, MarianTokenizer

model = MarianMTModel.from_pretrained("flagship-ai/francais-ghomala")
# Shared SentencePiece tokenizer ships in the int8 bundle subfolders:
tok = MarianTokenizer.from_pretrained("flagship-ai/cameroon-int8", subfolder="francais-ghomala")

ids = tok("Bonjour, comment vas-tu ?", return_tensors="pt")
print(tok.decode(model.generate(**ids)[0], skip_special_tokens=True))

Limitations

Trained on a small, formal-leaning corpus, so it is strongest on everyday/simple sentences and weaker on modern or technical vocabulary. It is an early, open baseline — an ongoing voice-data collection project aims to broaden coverage with real spoken contributions.

— Open models for Cameroonian languages · lingo.cm

Downloads last month: 18

Safetensors

Model size

74.7M params

Tensor type

F32