Solenopsisbot/Qwen3-4b-Deep-Beta
This is a deep reasoning finetune of Qwen/Qwen3-4B-Thinking-2507. It was trained using Reinforcement Learning (GRPO) to enforce deep thinking chains.
It is kind of weird, you may have to enforce the model to actually finish thinking and give a response before stopping its response, but ill hopefully have that fixed by the time this model is out of beta.
If you want to contact me my discord is @solenopsisbot
- Downloads last month
- 14