Gmail | LinkedIn | GitHub | Hugging Face
Author: Ramy I. Jamea

JameaMT-ar-en-350M is a highly efficient, lightweight decoder-only machine translation model designed to translate text from Arabic to English. Built on top of the revolutionary Liquid Foundation Model (LFM2.5-350M) architecture, this model delivers exceptionally fast inference and maintains a minimal memory footprint, making it ideal for edge devices, CPU environments, and large-scale data processing pipelines. I fine-tuned this model on 1 million high-quality sentence pairs from the OPUS dataset using an NVIDIA H200 GPU.

Why Liquid Foundation Models (LFM2.5)?

Selected LiquidAI/LFM2.5-350M as my base model due to its state-of-the-art hybrid architecture, which provides several massive advantages over traditional Transformer designs:

Radical Memory Efficiency: By replacing heavy self-attention layers with zero-cache convolution blocks and multiplicative gates, LFM reduces the KV cache footprint by up to 90%.
Scalability & Edge Deployment: The architecture has near-constant inference time and memory complexity. This allows it to achieve massive throughput without requiring massive VRAM, making it exceptionally fast on cloud GPUs and accessible on consumer CPUs or NPUs.
Fewer Embedding Layers: The architecture operates with high parameter efficiency. It natively supports Arabic, allowing it to tokenize and process the script effectively without being weighed down by bloated embedding layers.

Training Loop

Trained the model on an NVIDIA H200 GPU, leveraging Mixed Precision (bfloat16) and Gradient Checkpointing to maximize throughput and memory efficiency.

Dataset

Source: OPUS Dataset
Volume: 1,000,000 parallel Arabic-English sentence pairs.

Hyperparameters

Epochs: 3
Train Batch Size: 256
Eval Batch Size: 256
Optimizer: AdamW (Fused)
Learning Rate: 5e-5
LR Scheduler: Cosine
Warmup Ratio: 10%
Max Grad Norm: 1.0

Evaluation Loss Progress

Evaluation on 2000-Sample Test Set

Here is a breakdown of the evaluation metrics and a qualitative analysis demonstrating the model's strengths based on the provided test samples.

1. Decoding the Evaluation Metrics

SacreBLEU (39.79): BLEU measures the exact word and phrase overlap between the model's output and human references. A score approaching 40 is exceptional for Arabic-to-English translation. It indicates that the model frequently generates high-quality, highly understandable translations that require minimal to no human post-editing.
chrF (60.16): The Character n-gram F-score evaluates translations at the character level rather than just the word level. A score above 60 is highly competitive. It indicates that even when the model chooses a different synonym than the human reference, the root meaning, morphology, and semantic intent are closely aligned.

2. Why the Model is Highly Effective

By analyzing the specific outputs, the power of the model becomes clear. It does not just do word-for-word substitution; it understands context, tone, grammar, and even corrects human reference flaws.

A. Perfect Fluency and Idiomatic Accuracy

The model seamlessly handles everyday dialogue, matching human references perfectly without sounding robotic.

Source	Model Prediction	Human Reference	Insight
هيّا يا مهدّئات العضلات.	Come on, muscle relaxers.	Come on, muscle relaxers.	Perfect Match: Captures conversational syntax perfectly.
الفائز يحصل على الفتاة	The winner gets the girl.	Winner gets the girl.	Natural Phrasing: Understands English colloquial structure without awkward literal translations.

B. Mastering Tone and Colloquialisms

A strong translation model must capture the feeling of the text, adapting to informal language, slang, or emotional weight.

Source	Model Prediction	Human Reference	Insight
فى رايى لا يوجد اسوأ من شرطى جبان ملعون.	In my mind, there's nothing worse than a goddamn cowardly cop.	There's nothin' worse in my book than a goddamn yellow cop.	Semantic Depth: The model correctly translates the aggressive tone and meaning. "Cowardly" is the exact meaning of the slang "yellow" used in the reference.
هي متبنّية أسم عصري.	She's adopting a modern name.	She's adopted a fashionable name.	Synonym Flexibility: "Modern" and "fashionable" are practically interchangeable here, showing the model's high chrF score in action.

C. Superiority in Formal and Complex Syntax

Perhaps the most impressive indicator of the model's power is its ability to handle dense, formal sentence structures—sometimes even outperforming the provided human references.

Source	Model Prediction	Human Reference	Insight
وينبغي ألا تقدم التصويبات إلا للنص باللغات الأصلية.	Corrections should be submitted only in the original languages.	Corrections should be submitted to the original languages only.	Grammatical Superiority: The model's placement of "only in the original languages" actually flows better in formal English than the reference.
وكثيراً ما يكون ذلك التعاون في شكل وضع إطار مؤسسي لمتابعة مسائل الاستثمار...	Such cooperation is often in the form of the development of an institutional framework for follow-up on investment issues...	However, some agreements establish only a framework for cooperation between the contracting parties.	High Fidelity: The human reference here heavily summarized the text. The model, however, successfully parsed and translated the entire complex legal/institutional thought flawlessly.

Limitations

The Dialect Deficit (MSA vs. Colloquial Arabic): The most significant constraint stems from the dataset's exclusive reliance on Modern Standard Arabic (MSA / Fusha). Consequently, the model experiences a sharp performance degradation when processing regional dialects (e.g., Egyptian, Levantine, Gulf, Maghrebi) or informal, user-generated content. Because real-world applications—such as social media analysis or casual conversational text—frequently blend MSA with localized vernaculars, the model may struggle to accurately parse non-standard syntactic structures or colloquial slang.
Sensitivity to Source Text Noise and Formatting: Because the model relies on the highly structured nature of MSA, it expects a baseline of grammatical and orthographic correctness. Source text noise—such as missing or ambiguous diacritics (Tashkeel) that alter semantic meaning, spelling errors, or artifacts from upstream OCR extraction pipelines—can disproportionately impact the translation output.
Out-of-Distribution Domain Jargon: The model excels within its primary training distribution but may exhibit lower confidence or rely on literal translations when confronted with highly specialized jargon in untouched domains. Rapidly evolving technical, medical, or niche legal terminology may require targeted fine-tuning or integration with a Retrieval-Augmented Generation (RAG) pipeline to maintain high accuracy.
Document-Level Cohesion in Extended Contexts: The current evaluation highlights exceptional sentence-level and short-paragraph accuracy. However, as is common with highly optimized, parameter-efficient architectures, handling extensive, multi-page documents in a single pass may occasionally lead to losses in pronoun resolution or thematic cohesion, necessitating chunking strategies during inference.

How to Use

To run this model, you will need to ensure trust_remote_code=True is enabled so the custom Liquid architecture is loaded properly.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ramyibrahim/JameaMT-ar-en-350M"

# 1. Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto"
)
model.eval()

# 2. Format the input using the chat template
sample_text = "تعالى قبل انقضاء الشباب وموت الحب"
messages = [{"role": "user", "content": sample_text}]

inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

# 3. Generate the translation
with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        num_beams=5,
        early_stopping=True,
        length_penalty=1.0,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

# 4. Extract and print only the generated response
generated_tokens = output_ids[0][inputs["input_ids"].shape[1] :]
translation = tokenizer.decode(generated_tokens, skip_special_tokens=True)

print(f"Input: {sample_text}")
print(f"Translation: {translation}")

Downloads last month: 112

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for ramyibrahim/JameaMT-ar-en-350M

Base model

LiquidAI/LFM2.5-350M-Base

Finetuned

LiquidAI/LFM2.5-350M

Finetuned

(32)

this model