Instructions to use himalaya-ai/gemma4-e2b-it-nepali with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use himalaya-ai/gemma4-e2b-it-nepali with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E2B-it") model = PeftModel.from_pretrained(base_model, "himalaya-ai/gemma4-e2b-it-nepali") - Transformers
How to use himalaya-ai/gemma4-e2b-it-nepali with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="himalaya-ai/gemma4-e2b-it-nepali") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("himalaya-ai/gemma4-e2b-it-nepali", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use himalaya-ai/gemma4-e2b-it-nepali with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "himalaya-ai/gemma4-e2b-it-nepali" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "himalaya-ai/gemma4-e2b-it-nepali", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/himalaya-ai/gemma4-e2b-it-nepali
- SGLang
How to use himalaya-ai/gemma4-e2b-it-nepali with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "himalaya-ai/gemma4-e2b-it-nepali" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "himalaya-ai/gemma4-e2b-it-nepali", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "himalaya-ai/gemma4-e2b-it-nepali" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "himalaya-ai/gemma4-e2b-it-nepali", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use himalaya-ai/gemma4-e2b-it-nepali with Docker Model Runner:
docker model run hf.co/himalaya-ai/gemma4-e2b-it-nepali
Gemma4-E2B-IT-Nepali
This repository contains a Nepali supervised fine-tuned LoRA adapter for Google Gemma 4 E2B IT. The model was fine-tuned to improve Nepali instruction-following and Nepali conversational response generation using the himalaya-ai/nepali-sft-dataset dataset.
Model Details
Model Description
This model is a PEFT/LoRA adapter trained on top of google/gemma-4-E2B-it. It is designed for Nepali instruction-following tasks, Nepali question answering, Nepali text generation, and simple Nepali chatbot-style interaction.
Because this repository contains a LoRA adapter, the base model must be loaded first, and then this adapter should be attached using the peft library.
- Developed by: Yuv Raj Pant and Himalaya AI Labs
- Shared by: Himalaya AI Labs
- Model type: PEFT LoRA adapter for causal language modeling
- Base model: google/gemma-4-E2B-it
- Dataset: himalaya-ai/nepali-sft-dataset
- Language(s): Nepali and English
- License: Apache 2.0
- Fine-tuning method: Supervised Fine-Tuning (SFT) with LoRA / QLoRA-style training
Intended Use
This model is intended for research, experimentation, and community demonstrations involving Nepali language AI.
Potential use cases include:
- Nepali instruction-following
- Nepali chatbot applications
- Nepali question answering
- Nepali text generation
- Nepali-English bilingual assistant workflows
- Educational AI demos for Nepali users
- Low-resource language research
Out-of-Scope Use
This model should not be used as the only source of truth in high-stakes settings such as medical, legal, financial, emergency, or safety-critical decision-making.
The model may generate incorrect, biased, incomplete, or hallucinated outputs. Human review is recommended for public-facing or production use.
Training Dataset
The model was fine-tuned on:
himalaya-ai/nepali-sft-dataset
The dataset was used for supervised instruction fine-tuning. Since the dataset provides a training split, a small evaluation split was created from the training data during preprocessing.
Training Configuration
| Setting | Value |
|---|---|
| Base model | google/gemma-4-E2B-it |
| Dataset | himalaya-ai/nepali-sft-dataset |
| Number of epochs | 1 |
| Max sequence length | 2048 |
| Per-device train batch size | 4 |
| Per-device eval batch size | 4 |
| Gradient accumulation steps | 4 |
| Effective batch size | 16 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Warmup ratio | 0.03 |
| Weight decay | 0.0 |
| Max grad norm | 0.3 |
| LoRA rank | 32 |
| LoRA alpha | 64 |
| LoRA dropout | 0.05 |
| Evaluation fraction | 0.005 |
| Split seed | 42 |
How to Use
Install the required packages:
pip install -U transformers peft accelerate bitsandbytes torch
Then load the base model and attach the LoRA adapter:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
base_model_id = "google/gemma-4-E2B-it"
adapter_id = "himalaya-ai/gemma4-e2b-it-nepali"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
dtype=torch.bfloat16,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
Example Inference
import torch
@torch.inference_mode()
def chat(model, tokenizer, user_text, system=None):
messages = []
if system:
messages.append({"role": "system", "content": system})
messages.append({"role": "user", "content": user_text})
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.05,
pad_token_id=tokenizer.eos_token_id,
)
input_length = inputs["input_ids"].shape[-1]
new_tokens = outputs[0, input_length:]
return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
system_prompt = "You are a helpful AI assistant that answers in Nepali."
prompt = "नेपालको राजधानी कहाँ हो?"
response = chat(model, tokenizer, prompt, system=system_prompt)
print(response)
Example Prompts
- नेपालको राजधानी कहाँ हो?
Limitations
This model has not been fully benchmarked across all Nepali NLP tasks. It may produce hallucinated or factually incorrect answers, especially for questions requiring current information or specialized domain knowledge.
The model may also reflect biases present in the base model or fine-tuning dataset. Users should evaluate the model carefully for their specific use case.
Ethical Considerations
When deploying this model in public-facing applications, developers should consider adding safety filters, human review, and domain-specific evaluation. The model should not be used to produce harmful, deceptive, or high-risk advice.
Contributors
- Yuv Raj Pant
- Himalaya AI Labs
Acknowledgements
This model is based on Google DeepMind's Gemma 4 E2B IT model and was fine-tuned using the Himalaya AI Nepali SFT dataset.
- Downloads last month
- 28