--- library_name: peft base_model: google/gemma-3-4b-it language: - nso - en tags: - translation - african-languages - scientific-translation - afriscience-mt - lora - peft - gemma license: apache-2.0 pipeline_tag: translation model-index: - name: gemma_3_4b_it-lora-r64-nso-eng results: - task: type: translation metrics: - name: BLEU (test) type: bleu value: 39.14 - name: chrF (test) type: chrf value: 59.10 - name: SSA-COMET (test) type: comet value: 65.56 --- # gemma_3_4b_it-lora-r64-nso-eng [![Model on HF](https://huggingface.co/datasets/huggingface/badges/raw/main/model-on-hf-sm.svg)](https://huggingface.co/dsfsi/gemma_3_4b_it-lora-r64-nso-eng) This is a **LoRA adapter** for the AfriScience-MT project, enabling efficient scientific machine translation for African languages. ## Adapter Description | Property | Value | |----------|-------| | **Base Model** | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | | **Translation Direction** | Northern Sotho → English | | **LoRA Rank (r)** | 64 | | **LoRA Alpha** | 128 | | **Training Method** | QLoRA (4-bit quantization) | | **Domain** | Scientific/Academic texts | ### Why LoRA? LoRA (Low-Rank Adaptation) enables efficient fine-tuning by training only a small number of additional parameters. This adapter adds only **~32.0M parameters** to the base model while achieving strong translation performance. ## Evaluation Results Performance on the AfriScience-MT test set: | Split | BLEU | chrF | SSA-COMET | |-------|------|------|-----------| | Validation | 43.93 | 63.31 | 66.87 | | **Test** | **39.14** | **59.10** | **65.56** | **Metrics explanation:** - **BLEU**: Measures n-gram overlap with reference translations (0-100, higher is better) - **chrF**: Character-level F-score, robust for morphologically rich languages (0-100, higher is better) - **SSA-COMET**: Neural metric trained for Sub-Saharan African languages, shown as percentage (0-100, higher is better) ([McGill-NLP/ssa-comet-stl](https://huggingface.co/McGill-NLP/ssa-comet-stl)) ## Usage ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch # Configure 4-bit quantization (recommended for memory efficiency) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) # Load base model base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-4b-it", quantization_config=bnb_config, device_map="auto", torch_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-4b-it") # Load LoRA adapter adapter_name = "dsfsi/gemma_3_4b_it-lora-r64-nso-eng" model = PeftModel.from_pretrained(base_model, adapter_name) model.eval() # Prepare translation prompt source_text = "Climate change significantly impacts agricultural productivity in sub-Saharan Africa." instruction = "Translate the following Northern Sotho scientific text to English." # Format for Gemma chat template messages = [{"role": "user", "content": f"{instruction}\n\n{source_text}"}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Generate translation inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=256, num_beams=5, early_stopping=True, pad_token_id=tokenizer.pad_token_id, ) # Decode only the generated part generated = outputs[0][inputs["input_ids"].shape[1]:] translation = tokenizer.decode(generated, skip_special_tokens=True) print(translation) ``` ### Without Quantization (Full Precision) ```python # For GPUs with sufficient memory (>24GB for larger models) base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-4b-it", device_map="auto", torch_dtype=torch.bfloat16, ) model = PeftModel.from_pretrained(base_model, "dsfsi/gemma_3_4b_it-lora-r64-nso-eng") ``` ## Training Details ### Hyperparameters | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 64 | | LoRA Alpha | 128 | | LoRA Dropout | 0.05 | | Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Epochs | 3 | | Batch Size | 2 | | Learning Rate | 2e-04 | | Max Sequence Length | 512 | | Gradient Accumulation | 4 | ### Hardware Requirements | Configuration | VRAM Required | |---------------|---------------| | 4-bit (QLoRA) | ~8-12 GB | | 8-bit | ~16-20 GB | | Full precision | ~24-40 GB | ## Reproducibility To reproduce this adapter: ```bash # Clone the AfriScience-MT repository git clone https://github.com/afriscience-mt/afriscience-mt.git cd afriscience-mt # Install dependencies pip install -r requirements.txt # Run LoRA training python -m afriscience_mt.scripts.run_lora_training \ --data_dir ./data \ --source_lang nso \ --target_lang eng \ --model_name google/gemma-3-4b-it \ --model_type gemma \ --lora_rank 64 \ --output_dir ./output \ --num_epochs 3 \ --batch_size 4 \ --load_in_4bit ``` ## Limitations - **Domain Specificity**: Optimized for scientific/academic texts; may underperform on casual or colloquial language. - **Language Direction**: Only supports Northern Sotho → English translation. - **Base Model Required**: Must be used with the [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) base model. - **Context Length**: Maximum context is model-dependent; longer texts should be chunked. ## Citation If you use this model, please cite the AfriScience-MT paper ([arXiv:2605.29741](https://arxiv.org/abs/2605.29741)): ```bibtex @article{abdulmumin2026afriscience, title = {AfriScience-MT: Towards Decolonizing Science in Africa through Text Translation}, author = {Abdulmumin, Idris and Gwadabe, Tajuddeen and Muhammad, Shamsuddeen Hassan and Adelani, David Ifeoluwa and Khalo, Nomonde and Ahmad, Ibrahim Said and Modupe, Abiodun and Mumm, Anina and Biyela, Sibusiso and Rabie, Michelle and Havemann, Johanna and Rei, Marek and Abbott, Jade and Marivate, Vukosi}, journal = {arXiv preprint arXiv:2605.29741}, year = {2026}, url = {https://arxiv.org/abs/2605.29741} } ``` ## License This adapter is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). ## Acknowledgments - Base model: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) - LoRA implementation: [PEFT](https://github.com/huggingface/peft) - Evaluation: [SSA-COMET](https://huggingface.co/McGill-NLP/ssa-comet-stl) for African language assessment