Solenopsisbot/Qwen3-4b-Deep-Beta

This is a deep reasoning finetune of Qwen/Qwen3-4B-Thinking-2507. It was trained using Reinforcement Learning (GRPO) to enforce deep thinking chains.

It is kind of weird, you may have to enforce the model to actually finish thinking and give a response before stopping its response, but ill hopefully have that fixed by the time this model is out of beta. If you want to contact me my discord is @solenopsisbot