Instructions to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF", dtype="auto") - llama-cpp-python
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF", filename="Llama-3.2-3B-F1-Reasoning-Instruct-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Use Docker
docker model run hf.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
- SGLang
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Ollama:
ollama run hf.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
- Unsloth Studio
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF to start chatting
- Pi
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Docker Model Runner:
docker model run hf.co/twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
- Lemonade
How to use twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Llama-3.2-3B-F1-Reasoning-Instruct-GGUF-Q4_K_M
List all available models
lemonade list
# !pip install llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF",
filename="",
)
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Model Card for Llama-3.2-3B-F1-Reasoning-Instruct (a.k.a Formosa-1-Reasoning or F1-Reasoning)
Llama-3.2-3B-F1-Reasoning-Instruct(a.k.a Formosa-1-Reasoning or F1-Reasoning) 是由 Twinkle AI 與 APMIC 合作開發,並在國家高速網路與計算中心技術指導之下,針對中華民國台灣語境與任務需求所微調之繁體中文語言模型,涵蓋法律、教育、生活應用等多元場景,並以高指令跟隨能力為目標進行強化。
Model Details
Model Description
- Developed by: Liang Hsun Huang、Min Yi Chen、Wen Bin Lin、Chao Chun Chuang & Dave Sung (All authors have contributed equally to this work.)
- Funded by: APMIC
- Model type: LlamaForCausalLM
- Language(s) (NLP): Tranditional Chinese & English
- License: llama3.2
Model Sources
- Repository: twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct
- Paper: (TBA)
- Demo: Playground
Evaluation
Results
下表採用 🌟 Twinkle Eval 評測框架
| 模型 | 評測模式 | TMMLU+(%) | 台灣法律(%) | MMLU(%) | 測試次數 | 選項排序 |
|---|---|---|---|---|---|---|
| mistralai/Mistral-Small-24B-Instruct-2501 | box | 56.15 (±0.0172) | 37.48 (±0.0098) | 74.61 (±0.0154) | 3 | 隨機 |
| meta-llama/Llama-3.2-3B-Instruct | box | 15.49 (±0.0104) | 25.68 (±0.0200) | 6.90 (±0.0096) | 3 | 隨機 |
| meta-llama/Llama-3.2-3B-Instruct | pattern | 35.85 (±0.0174) | 32.22 (±0.0023) | 59.33 (±0.0168) | 3 | 隨機 |
| MediaTek-Research/Llama-Breeze2-3B-Instruct | pattern | 40.32 (±0.0181) | 38.92 (±0.0193) | 55.37 (±0.0180) | 3 | 隨機 |
| 🌟 twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct (ours) | box | 46.16 (±0.0198) | 34.92 (±0.0243) | 51.22 (±0.0206) | 3 | 隨機 |
下表用 lighteval 評測框架
| 模型 | MATH-500 | GPQA Diamond |
|---|---|---|
| meta-llama/Llama-3.2-3B-Instruct | 44.40 | 27.78 |
| 🌟 twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct (ours) | 51.40 | 33.84 |
Use this model
vLLM
vllm serve twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct \
--port 8001 \
--enable-reasoning \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes
Ollama
ollama run TwinkleAI/Llama-3.2-3B-F1-Resoning-Instruct
LM Studio
請在 My Models 中找到你要使用的模型,點選 ⚙️ Edit model default config。進入後,切換到 Prompt 頁籤,將原有的 Prompt Template 內容清空,並貼上以下提供的內容:
{% if bos_token is defined %}{{ bos_token }}{% endif %}
<|start_header_id|>system<|end_header_id|>
{% set first_is_system = messages|length > 0 and messages[0].role == 'system' %}
{% set has_tools = tools and tools|length > 0 %}
{% if not has_tools and first_is_system %}
{{ messages[0].content }}
{% elif has_tools and first_is_system %}
{{ messages[0].content }}
{% elif has_tools and not first_is_system %}
You are a function calling AI model.
{% endif %}
{% if tools and tools|length > 0 %}
You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
<tools>
[
{% for t in tools %}
{% set f = t.function if t.function is defined else t %}
{
"type": "function",
"function": {
"name": "{{ f.name }}",
"description": "{{ f.description }}",
"parameters": {{ f.parameters | tojson }}
}
}{% if not loop.last %},{% endif %}
{% endfor %}
]
</tools>
For each function call return a json object with function name and arguments within <tool_call> </tool_call> tags with the following schema:
<tool_call>
{"name": <function-name>, "arguments": <args-dict>}
</tool_call>
{% endif %}<|eot_id|>
{% for m in messages %}
{% if not (loop.first and m.role == 'system') %}
{% if m.role == 'user' %}
<|start_header_id|>user<|end_header_id|>
{{ m.content }}<|eot_id|>
{% elif m.role == 'assistant' %}
<|start_header_id|>assistant<|end_header_id|>
{% if m.tool_calls is defined and m.tool_calls %}
{% for tc in m.tool_calls %}
{% if tc.function is defined %}
<tool_call>
{"name": "{{ tc.function.name }}", "arguments": {{ tc.function.arguments | tojson }}}
</tool_call>
{% else %}
<tool_call>
{"name": "{{ tc.name }}", "arguments": {{ tc.arguments | tojson }}}
</tool_call>
{% endif %}
{% endfor %}
{% else %}
{{ m.content }}
{% endif %}<|eot_id|>
{% elif m.role == 'tool' %}
<|start_header_id|>ipython<|end_header_id|>
<tool_response>
{{ m.content }}
</tool_response><|eot_id|>
{% endif %}
{% endif %}
{% endfor %}
{% if add_generation_prompt %}
<|start_header_id|>assistant<|end_header_id|>
{% endif %}
🔧 Tool Calling
本模型使用 Hermes 格式訓練,並支援平行呼叫(Parallel calling),以下為完整範例流程。 Tool call 模板已經為大家寫好放進 chat-template 了,Enjoy it!
1️⃣ 啟動 vLLM 後端
⚠️ 注意:需要 vLLM 版本 >= 0.8.3,否則
enable-reasoning、enable-auto-tool-choice無法同時開啟
vllm serve twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct \
--port 8001 \
--enable-reasoning \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes
2️⃣ 定義工具(Functions)
def get_weather(location: str, unit: str):
return f"{location}的氣溫是{unit}26度,晴朗無風"
def search(query: str):
return "川普終於宣布對等關稅政策,針對 18 個經濟體課徵一半的對等關稅,並從 4/5 起對所有進口產品徵收10%的基準關稅!美國將針對被認定為不當貿易行為(不公平貿易) 的國家,於 4/9 起課徵報復型對等關稅 (Discounted Reciprocal Tariff),例如:日本將被課徵 24% 的關稅,歐盟則為 20%,以取代普遍性的 10% 關稅。\n針對中國則開啟新一波 34% 關稅,並疊加於先前已實施的關稅上,這將使中國進口商品的基本關稅稅率達到 54%,而且這尚未包含拜登總統任內或川普第一任期所施加的額外關稅。加拿大與墨西哥則不適用這套對等關稅制度,但川普認為這些國家在芬太尼危機與非法移民問題尚未完全解決,因此計畫對這兩國的大多數進口商品施加 25% 關稅。另外原本針對汽車與多數其他商品的關稅豁免將於 4/2 到期。\n台灣的部分,美國擬向台灣課徵32%的對等關稅,雖然並未針對晶片特別課徵關稅,但仍在記者會中提到台灣搶奪所有的電腦與半導體晶片,最終促成台積電對美國投資計劃額外加碼 1,000 億美元的歷史性投資;歐盟則課徵20%的對等關稅。最後是汽車關稅將於 4/2 起,對所有外國製造的汽車課徵25% 關稅。"
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "國家或城市名, e.g., 'Taipei'、'Jaipei'"},
"unit": {"type": "string", "description": "氣溫單位,亞洲城市使用攝氏;歐美城市使用華氏", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "unit"]
}
}
},
{
"type": "function",
"function": {
"name": "search",
"description": "這是一個類似 Google 的搜尋引擎,關於知識、天氣、股票、電影、小說、百科等等問題,如果你不確定答案就搜尋一下。",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "should be a search query, e.g., '2024 南韓 戒嚴'"}
},
"required": ["query"]
}
}
}
]
3️⃣ 執行工具調用(Tool Calls)
⚠️ 注意:system_prompt 可以不用帶,除非是需要時間基準的工具。
response = client.chat.completions.create(
model=client.models.list().data[0].id,
messages=[
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
],
max_tokens=1500,
temperature=0.6,
top_p=0.95,
tools=tools,
tool_choice="auto"
)
print(response.choices[0].message.reasoning_content)
print(response.choices[0].message.tool_calls)
🧠 推理內容輸出(僅顯示部分)
好的,我需要幫助這個使用者解決他們的問題。他們問了兩件事:首先,臺北市的天氣情況,以及第二,關於川普最近的關稅政策。
對於第一部分,他們提到了“臺北”,所以應該呼叫 get_weather 函式…
接下來是關於川普的新關稅政策…
總結一下,我需要分別進行兩次 API 呼叫,每次都有各自正確填寫的參數…
⚙️ Tool Calls List
[ChatCompletionMessageToolCall(id='chatcmpl-tool-35e74420119349999913a10133b84bd3', function=Function(arguments='{"location": "Taipei", "unit": "celsius"}', name='get_weather'), type='function'), ChatCompletionMessageToolCall(id='chatcmpl-tool-7ffdcb98e59f4134a6171defe7f2e31b', function=Function(arguments='{"query": "Donald Trump latest tariffs policy"}', name='search'), type='function')]
4️⃣ 產生最終回答
response = client.chat.completions.create(
model=client.models.list().data[0].id,
messages=[
{"role": "system", "content": "記住你的知識截止於 2024/12,今天是 2025/4/7"},
{"role": "user", "content": "台北氣溫如何? 另外,告訴我川普最新關稅政策"},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": response.choices[0].message.tool_calls[0].id,
"type": "function",
"function": {
"name": response.choices[0].message.tool_calls[0].function.name,
"arguments": response.choices[0].message.tool_calls[0].function.arguments
}
},
{
"id": response.choices[0].message.tool_calls[1].id,
"type": "function",
"function": {
"name": response.choices[0].message.tool_calls[1].function.name,
"arguments": response.choices[0].message.tool_calls[1].function.arguments
}
}
]
},
{
"role": "tool",
"content": search(**json.loads(response.choices[0].message.tool_calls[0].function.arguments)),
"tool_call_id": response.choices[0].message.tool_calls[0].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
},
{
"role": "tool",
"content": get_weather(**json.loads(response.choices[0].message.tool_calls[1].function.arguments)),
"tool_call_id": response.choices[0].message.tool_calls[1].id # tool_call_id 必須要帶,才能正確配對 工具 及 tool_call
}
],
max_tokens=1500,
temperature=0.6,
top_p=0.95,
tools=tools,
tool_choice="auto"
)
print(response.choices[0].message.reasoning_content)
print(response.choices[0].message.content)
🧠 推理內容輸出(僅顯示部分)
首先,我需要處理使用者的查詢,他們要求了解臺北市的當下氣溫以及川普最近的關稅政策…
在呼叫了get_weather後得到了臺北市氣溫為 26 度(攝氏)…
接著,使用search搜尋「川普最新關稅政策 2025」…
整合後,我提供如下摘要:
📋 最終輸出內容
以下是您請求的資訊:
**臺北市氣溫**
- 目前的氣溫為 **26°C**(攝氏)
- 天候狀況:晴朗無風
**川普最新關稅政策概述**
1. **對等關稅政策**
- 對 18 個經濟體課徵 50% 的對等關稅
- 自 4 月 5 日起,所有進口產品全面徵收 10% 基本關稅
2. **報復型對等關稅**
- 日本 24%、歐盟 20%
3. **對中國的高額關稅**
- 增加至 54%(原有關稅 + 新增 34%)
4. **特殊案例**
- 加拿大與墨西哥不適用,但其他商品課徵 25%
- 汽車與部分商品的免稅即將到期
5. **對台灣的影響**
- 美國計畫對台灣課徵 32% 關稅,但晶片暫無額外課稅
6. **全球視角**
- 歐盟與日本關稅比例相對較高
Citation
@misc{twinkleai2025llama3.2f1,
title = {Llama-3.2-3B-F1-Reasoning-Instruct: A Traditional Chinese Instruction-Tuned Reasoning Language Model for Taiwan},
author = {Huang, Liang Hsun and Chen, Min Yi and Lin, Wen Bin and Chuang, Chao Chun and Sung, Dave},
year = {2025},
howpublished = {\url{https://huggingface.co/twinkle-ai/Llama-3.2-3B-F1-Instruct}},
note = {Twinkle AI and APMIC. All authors contributed equally.}
}
Acknowledge
- 特此感謝國家高速網路與計算中心的指導與 APMIC 的算力支援,才得以讓本專案訓利完成。
- 特此致謝黃啟聖老師、許武龍(哈爸)、臺北市立第一女子高級中學物理科陳姿燁老師、奈視科技 CTO Howard、AIPLUX Technology、郭家嘉老師以及所有在資料集製作過程中提供寶貴協助的夥伴。
Model Card Authors
Model Card Contact
- Downloads last month
- 2
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF
Base model
meta-llama/Llama-3.2-3BDatasets used to train twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF
minyichen/tw_mm_R1
minyichen/tw-instruct-R1-200k
Collection including twinkle-ai/Llama-3.2-3B-F1-Reasoning-Instruct-GGUF
Evaluation results
- single choice on tmmlu+test set self-reported46.160
- single choice on mmlutest set self-reported51.220
- single choice on tw-legal-benchmark-v1test set self-reported34.920

# Gated model: Login with a HF token with gated access permission hf auth login