Instructions to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gateremark/kikuyu_translategemma_4b_v7_highrank_rslora")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("gateremark/kikuyu_translategemma_4b_v7_highrank_rslora")
model = AutoModelForImageTextToText.from_pretrained("gateremark/kikuyu_translategemma_4b_v7_highrank_rslora")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gateremark/kikuyu_translategemma_4b_v7_highrank_rslora

SGLang

How to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gateremark/kikuyu_translategemma_4b_v7_highrank_rslora to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for gateremark/kikuyu_translategemma_4b_v7_highrank_rslora to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for gateremark/kikuyu_translategemma_4b_v7_highrank_rslora to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="gateremark/kikuyu_translategemma_4b_v7_highrank_rslora",
    max_seq_length=2048,
)

Docker Model Runner
How to use gateremark/kikuyu_translategemma_4b_v7_highrank_rslora with Docker Model Runner:
```
docker model run hf.co/gateremark/kikuyu_translategemma_4b_v7_highrank_rslora
```

Uploaded finetuned model

Developed by: gateremark
License: apache-2.0
Finetuned from model: google/translategemma-4b-it

This Gemma3 / TranslateGemma model was trained with Unsloth and Hugging Face's TRL library.

Kikuyu TranslateGemma-4B V7

Fine-tuned English -> Kikuyu translation model based on Google's TranslateGemma-4B-it.

This is the current fast production model behind C-elo Translate. It was trained as a smaller, faster alternative to the earlier 12B model while improving automatic evaluation scores and manual translation quality.

Live demo: c-elo.com/c-elo-ai

Previous 12B model: gateremark/kikuyu_translategemma_12b_merged_V2

Model Details

Attribute	Value
Base model	google/translategemma-4b-it
Model family	TranslateGemma / Gemma3
Hub size	~5B parameters, BF16 safetensors
Fine-tuning method	rsLoRA, high-rank LoRA
LoRA rank / alpha	r=256, alpha=256
Training data	30,430 English-Kikuyu pairs
Direction	English -> Kikuyu
BLEU	21.93
chrF++	42.87
Eval loss	0.7518
Framework	Unsloth + TRL + Transformers
Training platform	Modal, NVIDIA H100

Why This Model

The earlier 12B Kikuyu TranslateGemma model reached 19.61 BLEU, but it was large and slower to cold-start in production. This V7 4B-family model is smaller, faster to load, and evaluated better on the same held-out split:

Model	BLEU	chrF++	Notes
12B LoRA V2	19.61	-	Earlier production model
4B V2 LoRA r256	17.76	38.31	Strong first 4B run
4B V3 DoRA r128	15.81	35.88	DoRA did not improve this setup
4B V6 LoRA r256, 4 epochs	17.67	38.28	Longer training did not improve V2
4B V7 rsLoRA r256	21.93	42.87	Current champion

Usage

Recommended: Unsloth / Gemma3Processor Path

This model uses the TranslateGemma/Gemma3 chat template. For reliable generation, use the processor for apply_chat_template() and the underlying text tokenizer for tokenization/decoding.

import torch
from unsloth import FastLanguageModel

model_id = "gateremark/kikuyu_translategemma_4b_v7_highrank_rslora"

model, processor = FastLanguageModel.from_pretrained(
    model_name=model_id,
    max_seq_length=4096,
    dtype=None,
    load_in_4bit=False,  # Set True if you need lower VRAM and accept possible quality changes.
)

text_tokenizer = (
    getattr(processor, "tokenizer", None)
    or getattr(processor, "text_tokenizer", None)
    or processor
)

if text_tokenizer.pad_token_id is None:
    text_tokenizer.pad_token = text_tokenizer.eos_token

model.config.pad_token_id = text_tokenizer.pad_token_id
text_tokenizer.padding_side = "left"
FastLanguageModel.for_inference(model)

terminators = []
for token_id in [
    text_tokenizer.eos_token_id,
    text_tokenizer.convert_tokens_to_ids("<end_of_turn>"),
    text_tokenizer.convert_tokens_to_ids("<eos>"),
]:
    if (
        isinstance(token_id, int)
        and token_id >= 0
        and token_id != getattr(text_tokenizer, "unk_token_id", None)
        and token_id not in terminators
    ):
        terminators.append(token_id)


def translate_to_kikuyu(text: str) -> str:
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "source_lang_code": "en",
                    "target_lang_code": "ki",
                    "text": text,
                }
            ],
        }
    ]

    formatted_text = processor.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )

    inputs = text_tokenizer(
        [formatted_text],
        return_tensors="pt",
        padding=True,
    )
    inputs = {key: value.to(model.device) for key, value in inputs.items()}

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=128,
            do_sample=False,
            eos_token_id=terminators,
            pad_token_id=text_tokenizer.pad_token_id,
        )

    input_len = inputs["input_ids"].shape[1]
    response = text_tokenizer.decode(
        outputs[0][input_len:],
        skip_special_tokens=True,
    )
    return response.strip()


print(translate_to_kikuyu("Hello, how are you?"))
# Example output: Ndũmĩrĩrie, ũraigua atĩa?

Minimal Inference Notes

Use target_lang_code="ki" for Kikuyu.
Use left padding for batched generation with decoder-only models.
Deterministic decoding (do_sample=False) is recommended for translation.
The model is trained for English -> Kikuyu. Reverse Kikuyu -> English was not part of this run.

Training Details

Dataset

Dataset: gateremark/english-kikuyu-translations
Size: 30,430 parallel English-Kikuyu sentence pairs
Split: 95% train / 5% eval
Train examples: 28,908
Eval examples: 1,522

V7 Hyperparameters

Parameter	Value
Method	rsLoRA
LoRA rank	256
LoRA alpha	256
LoRA dropout	0
DoRA	False
Epochs	3
Learning rate	1e-4
Batch size	32 effective batch
Optimizer	AdamW 8-bit
Weight decay	0.01
Precision	BF16
Max sequence length	4096

Target Modules

[
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj",
]

Evaluation

Evaluation was run on the held-out 5% split from the same dataset using BLEU and chrF++.

Metric	Score
BLEU	21.93
chrF++	42.87
Eval loss	0.7518

Automatic metrics are useful for regression testing, but Kikuyu quality should also be checked with native-speaker review because morphology, idiom, tone, and dialect variation are not fully captured by BLEU.

Sample Translations

English	Kikuyu
Hello, how are you?	Hihi, ũrĩ atĩa?
The weather is beautiful today.	Rĩera nĩ rĩega mũno ũmũthĩ
I love learning new languages.	Nĩ nyendete kwĩruta thiomi njerũ

Intended Use

English -> Kikuyu translation tools
Kikuyu language learning applications
Low-resource African language NLP research
Cultural and linguistic preservation projects
Prototyping multilingual AI interfaces for Kikuyu speakers

Limitations

Direction: English -> Kikuyu only. Kikuyu -> English was not trained in this run.
Language coverage: Optimized for Kikuyu (ki), not other Gikuyu-related dialects or neighboring Bantu languages.
Domain: Best for general text. Technical, legal, medical, poetic, or highly idiomatic content may need human review.
Evaluation: BLEU and chrF++ do not fully measure naturalness, dialect fit, or cultural nuance.
Production use: Review outputs before high-stakes use.

Citation

@misc{gatere2026kikuyutranslategemma4bv7,
  author = {Mark Gatere},
  title = {Kikuyu TranslateGemma-4B V7: rsLoRA Fine-tuning for English to Kikuyu Translation},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/gateremark/kikuyu_translategemma_4b_v7_highrank_rslora}}
}

Acknowledgments

Google for TranslateGemma-4B-it
Unsloth for efficient fine-tuning
Hugging Face for model and dataset hosting
Modal for GPU training and deployment infrastructure
Kikuyu speakers and reviewers supporting C-elo's translation work

Downloads last month: 58

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for gateremark/kikuyu_translategemma_4b_v7_highrank_rslora

Base model

google/translategemma-4b-it

Adapter

(3)

this model

Evaluation results

BLEU on gateremark/english-kikuyu-translations
self-reported

21.930
chrF++ on gateremark/english-kikuyu-translations
self-reported

42.870