Instructions to use sarvamai/sarvam-1-v0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sarvamai/sarvam-1-v0.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sarvamai/sarvam-1-v0.5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sarvamai/sarvam-1-v0.5")
model = AutoModelForCausalLM.from_pretrained("sarvamai/sarvam-1-v0.5")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sarvamai/sarvam-1-v0.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sarvamai/sarvam-1-v0.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvamai/sarvam-1-v0.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sarvamai/sarvam-1-v0.5

SGLang

How to use sarvamai/sarvam-1-v0.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sarvamai/sarvam-1-v0.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvamai/sarvam-1-v0.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sarvamai/sarvam-1-v0.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sarvamai/sarvam-1-v0.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sarvamai/sarvam-1-v0.5 with Docker Model Runner:
```
docker model run hf.co/sarvamai/sarvam-1-v0.5
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

The fully-trained version of this model is now available at https://huggingface.co/sarvamai/sarvam-1

Update (Aug 15, 2024): You can now get started with text completions and supervised finetuning using this notebook on Google colab!

This is an early checkpoint of sarvam-2b, a small, yet powerful language model pre-trained from scratch on 2 trillion tokens. It is trained to be good at 10 Indic languages + English. Officially, the Indic languages supported are: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.

The final checkpoint of sarvam-2b will be released soon, and it will be trained on a data mixture of 4 trillion tokens: containing equal parts English (2T) and Indic (2T) tokens.

The current checkpoint has not undergone any post-training. You can see the capabilities of the current checkpoint in this video.

The model was trained with NVIDIA NeMo™ Framework on the Yotta Shakti Cloud using HGX H100 systems.

Getting started:

from transformers import pipeline
pipe = pipeline(model='sarvamai/sarvam-2b-v0.5', device=0)
pipe('भारत के प्रथम प्रधानमंत्री', max_new_tokens=15, temperature=0.1, repetition_penalty=1.2)[0]['generated_text']
# 'भारत के प्रथम प्रधानमंत्री जवाहरलाल नेहरू थे।\n\n'

Tokenizer

sarvam-2b's tokenizer is built to be efficient for Indic languages and has an average fertility score of ~2 which is significantly lower than other models.

Here is a comparison of fertility scores between sarvam-2b and other popular models.

	Sarvam-2B	Llama-3.1	Gemma-2	GPT-4o
ben_Beng	2.07	8.02	3.72	2.34
eng_Latn	1.43	1.24	1.23	1.23
guj_Gujr	1.81	9.97	3.9	2.3
hin_Deva	1.4	2.67	1.96	1.65
kan_Knda	2.37	14.95	5.55	3.29
mal_Mlym	2.85	16.26	5.88	3.52
mar_Deva	1.77	3.99	3.2	2.56
ory_Orya	2.35	16.84	6.87	6.83
pan_Guru	1.68	8.19	3.37	2.72
tam_Taml	2.17	12.39	4.19	3.17
tel_Telu	2.14	13.3	4.57	3.06
Average	2.08	9.34	4.01	3.00

More technical details like evaluations and benchmarking will be posted soon.

Downloads last month: 1,681

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for sarvamai/sarvam-1-v0.5

Adapters

2 models

Finetunes

1 model

Quantizations

7 models

sarvamai
/

sarvam-1-v0.5

Tokenizer

Model tree for sarvamai/sarvam-1-v0.5

Spaces using sarvamai/sarvam-1-v0.5 10