Instructions to use sarvamai/sarvam-1-v0.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sarvamai/sarvam-1-v0.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sarvamai/sarvam-1-v0.5") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("sarvamai/sarvam-1-v0.5") model = AutoModelForCausalLM.from_pretrained("sarvamai/sarvam-1-v0.5") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sarvamai/sarvam-1-v0.5 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sarvamai/sarvam-1-v0.5" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sarvamai/sarvam-1-v0.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/sarvamai/sarvam-1-v0.5
- SGLang
How to use sarvamai/sarvam-1-v0.5 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sarvamai/sarvam-1-v0.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sarvamai/sarvam-1-v0.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sarvamai/sarvam-1-v0.5" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sarvamai/sarvam-1-v0.5", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use sarvamai/sarvam-1-v0.5 with Docker Model Runner:
docker model run hf.co/sarvamai/sarvam-1-v0.5
The fully-trained version of this model is now available at https://huggingface.co/sarvamai/sarvam-1
Update (Aug 15, 2024): You can now get started with text completions and supervised finetuning using this notebook on Google colab!
This is an early checkpoint of sarvam-2b, a small, yet powerful language model pre-trained from scratch on 2 trillion tokens. It is trained to be good at 10 Indic languages + English. Officially, the Indic languages supported are: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu.
The final checkpoint of sarvam-2b will be released soon, and it will be trained on a data mixture of 4 trillion tokens: containing equal parts English (2T) and Indic (2T) tokens.
The current checkpoint has not undergone any post-training. You can see the capabilities of the current checkpoint in this video.
The model was trained with NVIDIA NeMo™ Framework on the Yotta Shakti Cloud using HGX H100 systems.
Getting started:
from transformers import pipeline
pipe = pipeline(model='sarvamai/sarvam-2b-v0.5', device=0)
pipe('भारत के प्रथम प्रधानमंत्री', max_new_tokens=15, temperature=0.1, repetition_penalty=1.2)[0]['generated_text']
# 'भारत के प्रथम प्रधानमंत्री जवाहरलाल नेहरू थे।\n\n'
Tokenizer
sarvam-2b's tokenizer is built to be efficient for Indic languages and has an average fertility score of ~2 which is significantly lower than other models.
Here is a comparison of fertility scores between sarvam-2b and other popular models.
| Sarvam-2B | Llama-3.1 | Gemma-2 | GPT-4o | |
|---|---|---|---|---|
| ben_Beng | 2.07 | 8.02 | 3.72 | 2.34 |
| eng_Latn | 1.43 | 1.24 | 1.23 | 1.23 |
| guj_Gujr | 1.81 | 9.97 | 3.9 | 2.3 |
| hin_Deva | 1.4 | 2.67 | 1.96 | 1.65 |
| kan_Knda | 2.37 | 14.95 | 5.55 | 3.29 |
| mal_Mlym | 2.85 | 16.26 | 5.88 | 3.52 |
| mar_Deva | 1.77 | 3.99 | 3.2 | 2.56 |
| ory_Orya | 2.35 | 16.84 | 6.87 | 6.83 |
| pan_Guru | 1.68 | 8.19 | 3.37 | 2.72 |
| tam_Taml | 2.17 | 12.39 | 4.19 | 3.17 |
| tel_Telu | 2.14 | 13.3 | 4.57 | 3.06 |
| Average | 2.08 | 9.34 | 4.01 | 3.00 |
More technical details like evaluations and benchmarking will be posted soon.
- Downloads last month
- 1,681