Qwen3.5-9B Uncensored — No-Think Edition (GGUF)

⚡ Zero refusals. Zero thinking delay. 100% local.

This is a patched GGUF of HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive with one key modification: thinking is disabled at the GGUF template level, giving you instant responses without the 15–30 second reasoning delay.

What's different

Qwen3.5 is a thinking model. By default it outputs a <think>...</think> block before every response. This is great for hard problems but brutal for everyday use — you wait 20 seconds for a simple answer.

This model patches the embedded Jinja2 chat template to always output an empty think block:

Original flow:  <think> [400 tokens] </think> → answer    (~25s wait)
This model:     <think></think> → answer                  (<1s wait)

The model's intelligence is encoded in its weights, not the thinking trace. Quality is the same. Speed is 25x better for time-to-first-token.

Want reasoning on demand? Add /think to any message — the model will reason through it fully for that turn only.

Model details

Property Value
Base Qwen3.5-9B
Fine-tune HauhauCS Uncensored Aggressive
Quantization Q4_K_M
Context Up to 65,536 tokens
Parameters 9B
Format GGUF
Refusal rate 0%

Benchmarks (MacBook Pro M2 Pro, 16 GB)

Metric Value
Generation speed ~22–25 tok/s
Time to first token < 1 second
Context window 65,536 tokens
VRAM usage ~8.5 GB

How to use

LM Studio (recommended)

  1. Download the Q4_K_M file below
  2. Load in LM Studio with --context-length 65536 --gpu max
  3. Done — no config needed, thinking is already patched off

Optimal sampling (Qwen3 official recommended)

Temperature: 0.6
Top-P: 0.95
Top-K: 20
Repeat penalty: 1.0
Max tokens: 4096

llama.cpp

./llama-cli -m Qwen3.5-9B-Uncensored-nothink-Q4_K_M.gguf \
  --ctx-size 65536 \
  --n-gpu-layers 99 \
  -p "Your prompt here"

Full automated setup for Mac

👉 github.com/nandukmelath/lmstudio-uncensored-setup

One command: VRAM boost + auto-start + model load + Hermes Agent config:

git clone https://github.com/nandukmelath/lmstudio-uncensored-setup
cd lmstudio-uncensored-setup && ./scripts/setup.sh

How the patch works

The Qwen3.5 GGUF contains an embedded Jinja2 chat template with this block:

{%- if enable_thinking is defined and enable_thinking is false %}
    {{- '<think>\n\n</think>\n\n' }}
{%- else %}
    {{- '<think>\n' }}
{%- endif %}

The patch replaces it with just:

{{- '<think>\n\n</think>\n\n' }}

Same file size (padded with spaces), same structure, zero thinking overhead. The patcher script is open source: patch_nothink.py

Credits

License

Apache 2.0

Downloads last month
105
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for nandukmelath/Qwen3.5-9B-Uncensored-nothink-GGUF

Finetuned
Qwen/Qwen3.5-9B
Quantized
(8)
this model