Text Generation
Transformers
Safetensors
PyTorch
llama
facebook
meta
llama-3
Eval Results
text-generation-inference
Instructions to use meta-llama/Llama-3.2-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meta-llama/Llama-3.2-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Llama-3.2-1B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use meta-llama/Llama-3.2-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "meta-llama/Llama-3.2-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.2-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/meta-llama/Llama-3.2-1B
- SGLang
How to use meta-llama/Llama-3.2-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.2-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.2-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.2-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use meta-llama/Llama-3.2-1B with Docker Model Runner:
docker model run hf.co/meta-llama/Llama-3.2-1B
Request: DOI
2
#29 opened over 1 year ago
by
reddy03
Request: DOI
#28 opened over 1 year ago
by
Natwar
Request: DOI
3
#24 opened over 1 year ago
by
romanbot
Does anyone try to translate Llama model to MLIR ?
#23 opened over 1 year ago
by
hmsjwzb
how use it?
1
#22 opened over 1 year ago
by
SweetPotato26
Trouble connecting to ONNX
1
#21 opened over 1 year ago
by
shaunak404
Reproduction of MATH results
1
#20 opened over 1 year ago
by
fzyzcjy
Error when using Inference API
1
#19 opened over 1 year ago
by
Krooz
need new version of torchao
2
#18 opened over 1 year ago
by
bdytx5
Is the BOS token id of 128000 hardcoded into the llama 3.2 tokenizer?
2
#17 opened over 1 year ago
by
rasyosef
Output length control
#16 opened over 1 year ago
by
kushr
transformers AutoModelForCausalLM, AutoTokenizer usage problem.
1
#15 opened over 1 year ago
by
sleepcat
Examples of usage
5
#14 opened over 1 year ago
by
ernestyalumni
Error when evaluate this model using pipeline
2
#13 opened over 1 year ago
by
xiaxin1998
socket hung up
1
#12 opened over 1 year ago
by
MrCopperField
RuntimeError: shape '[1, 5, 32, 64]' is invalid for input of size 2560
👀 1
1
#11 opened over 1 year ago
by
eemberda
md5sums of files
#10 opened over 1 year ago
by
ernestyalumni