Good for home automation
Large context LLMs that work well with Home Assistant via Llama.cpp server running on CPU with 16GB ram.
17B • Updated • 153 • 23Note FAST! Solid function calling, but roleplay takes quite a bit of prompt trial and error. Good context memory.
Orion-zhen/Qwen3-30B-A3B-Instruct-2507-IQK-GGUF
31B • Updated • 227 • 1Note Excellent roleplay, flawless function calling and logic. Poor memory. Prefill is slow, but cache makes it very fast. Use intelligent expert skipping for faster generation. Requires https://github.com/ikawrakow/ik_llama.cpp
-
Intel/Qwen3-30B-A3B-Instruct-2507-gguf-q2ks-mixed-AutoRound
31B • Updated • 281 • 26
Tiiny/SmallThinker-21BA3B-Instruct
Text Generation • 22B • Updated • 104 • 110Note Fantastic all-around. Good roleplay, consistent function calling. When prompt caching is enabled, prefill is really fast, but token generation is a bit slow. Initial prefill is painfully slow but very fast after. Q3_K_M, temp 0.6, top_p 0.95, top_k 20.
mradermacher/BitCPM4-1B-GGUF
1B • Updated • 488 • 1Note Works unusually well for a 1B parameter model, and it's ternary! Doesn't seem to support prompt cache so prefill is slow, but text generation is absurdly fast.
-
redponike/Ling-mini-2.0-GGUF-ik
Text Generation • 16B • Updated • 260 • 1 -
openbmb/VoxCPM-0.5B
Text-to-Speech • Updated • 2.32k • 778