Text Generation
Transformers
Safetensors
English
llama
fine-tuned
lora
sft
auto-sft
conversational
text-generation-inference
Instructions to use theprint/Llama3.2-3B-Explained with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use theprint/Llama3.2-3B-Explained with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="theprint/Llama3.2-3B-Explained") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("theprint/Llama3.2-3B-Explained") model = AutoModelForMultimodalLM.from_pretrained("theprint/Llama3.2-3B-Explained") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use theprint/Llama3.2-3B-Explained with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "theprint/Llama3.2-3B-Explained" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theprint/Llama3.2-3B-Explained", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/theprint/Llama3.2-3B-Explained
- SGLang
How to use theprint/Llama3.2-3B-Explained with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "theprint/Llama3.2-3B-Explained" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theprint/Llama3.2-3B-Explained", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "theprint/Llama3.2-3B-Explained" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "theprint/Llama3.2-3B-Explained", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use theprint/Llama3.2-3B-Explained with Docker Model Runner:
docker model run hf.co/theprint/Llama3.2-3B-Explained
Llama3.2-3B-Explained
A fine-tuned version of meta-llama/Llama-3.2-3B-Instruct trained on Explained 0.41k alpaca data using Auto-SFT — an automated hyperparameter search and supervised fine-tuning pipeline.
The base model was adapted to follow the style and content of the Explained 0.41k alpaca dataset. Expect improved performance on tasks similar to those represented in the training data.
Model Details
| Property | Value |
|---|---|
| Base model | meta-llama/Llama-3.2-3B-Instruct |
| Training data | data/Explained-0.41k-alpaca.json |
| Fine-tuning epochs | 2 |
| Fine-tuning date | 2026-03-25 |
| Fine-tuning method | LoRA (merged to full 16-bit) |
Training Hyperparameters
LoRA
| Parameter | Value |
|---|---|
r |
4 |
alpha |
8 |
dropout |
0.0 |
target_modules |
['q_proj', 'v_proj', 'k_proj', 'o_proj'] |
Training
| Parameter | Value |
|---|---|
learning_rate |
1e-05 |
batch_size |
1 |
gradient_accumulation_steps |
2 |
warmup_ratio |
0.0 |
max_seq_length |
512 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("theprint/Llama3.2-3B-Explained")
tokenizer = AutoTokenizer.from_pretrained("theprint/Llama3.2-3B-Explained")
Generated by Auto-SFT
- Downloads last month
- 5