Nex-N2-mini Q8_0 GGUF

Quantized GGUF version of nex-agi/Nex-N2-mini โ€” an agentic model with Agentic Thinking, built on Qwen3.5-35B-A3B-Base.

โš ๏ธ Disclaimer

This is my first time uploading a quantized model to Hugging Face. I am not affiliated with nex-agi. This is a community contribution โ€” use at your own discretion.

I have modified the chat template so that reasoning (thinking) works correctly with llama.cpp. Without this adjustment, the reasoning output was not properly separated from the response. If you encounter any issues, please open a discussion.

Available Quantization

File Quant Size
Nex-N2-mini-Q8_0.gguf Q8_0 ~35 GB

Usage with llama.cpp

Recommended Start Parameters

./llama-server \
  --model Nex-N2-mini-Q8_0.gguf \
  --jinja \
  --reasoning on \
  --reasoning-format deepseek-legacy \
  --temp 0.7 \
  --top-p 0.95 \
  --top-k 40 \
  -ngl 99

Key Flags Explained

Flag Value Purpose
--jinja โ€” Enable Jinja2 chat template rendering (required for custom template)
--reasoning on Enable reasoning/thinking output
--reasoning-format deepseek-legacy Use DeepSeek-style reasoning format
-ngl 99 Offload all layers to GPU

About Nex-N2-mini

Nex-N2-mini is an agentic model with Agentic Thinking โ€” a framework that unifies reasoning, tool use, and environment execution:

  • Adaptive Thinking: The model decides when and how deeply to think
  • Coherent Thinking: Consistent reasoning paradigm across tasks and modalities

Built on Qwen3.5-35B-A3B-Base (MoE architecture, 35B total / 3B active parameters).

For more information, see the original model card: nex-agi/Nex-N2-mini

License

Apache-2.0 (same as the original model)

Downloads last month
668
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for hendrik289/Nex-N2-mini-Q8_0-GGUF

Quantized
(35)
this model