Solenopsisbot/Qwen3-4b-Deep-Beta

This is a deep reasoning finetune of Qwen/Qwen3-4B-Thinking-2507. It was trained using Reinforcement Learning (GRPO) to enforce deep thinking chains.

It is kind of weird, you may have to enforce the model to actually finish thinking and give a response before stopping its response, but ill hopefully have that fixed by the time this model is out of beta. If you want to contact me my discord is @solenopsisbot

Downloads last month
14
Safetensors
Model size
4B params
Tensor type
F16
·
Video Preview
loading

Model tree for Solenopsisbot/Qwen3-4b-Deep-Beta

Finetuned
(143)
this model
Quantizations
2 models