Inference Providers
Active filters: grpo
Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math_smalllr
Text Generation
• 2B • Updated • 3
mradermacher/Qwen2.5-1.5B-Thinking-v1.1-GGUF
2B • Updated • 81
• 2
mradermacher/Qwen2.5-1.5B-Thinking-v1.1-i1-GGUF
2B • Updated • 162
• 1
Dongwei/Qwen-2.5-7B_Base_Math_smalllr
Text Generation
• 8B • Updated • 12
• • 6
jeremierostan/qwen-guiding-question
Updated
May811/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 2
spinech/qwen2.5-3b-r1-arc-train-thinker-2
Text Generation
• 3B • Updated • 1
Dongwei/Qwen-2.5-7B_Base_Math_smallestlr
Text Generation
• 8B • Updated • 1
Dongwei/Qwen-2.5-7B_Base_Math_smallestlr_newdata
Text Generation
• 8B • Updated • 7
sohyunan/gemma-2-2b-it_controller-grpo
Text Generation
• 3B • Updated • 4
zzhang1987/Qwen2.5-VL-3B-Instruct-Open-R1-Distill
Image-Text-to-Text
• 4B • Updated • 9
rzhao17/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 3
Novaciano/Q5KM-Charcard-RP-1B-GRPO_MiniThinky-GGUF
Text Generation
• 1B • Updated • 8
• 2
schwamaths/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• Updated • 2
Chris126/qwen-r1-aha-moment
Updated
Text Generation
• 0.1B • Updated • 2
ibndias/Qwen2.5-1.5B-Open-R1-GRPO1st
Text Generation
• 2B • Updated • 3
jdqqjr/Qwen2.5-0.5B-Open-R1-GRPO
Text Generation
• 0.6B • Updated • 5
khuang2/qwen-2.5-3b-r1-countdown-offline_query_gen
Text Generation
• 3B • Updated • 5
mradermacher/qwen-2.5-3b-r1-countdown-GGUF
3B • Updated • 29
• 1
mradermacher/prem-1B-grpo-GGUF
Reinforcement Learning
• 1B • Updated • 21
mradermacher/qwen2.5-3b-r1-arc-train-thinker-GGUF
3B • Updated • 47
• 1
khuang2/qwen-2.5-3b-r1-countdown-offline_query_gen_solvable_only
Text Generation
• 3B • Updated • 3
schwamaths/Qwen2.5-1.5B-Instruct-Open-R1-GRPO
Text Generation
• Updated • 1
YC-DREAL/Qwen-2.5-3B-GRPO-Math
Text Generation
• 3B • Updated • 3
• 1
mradermacher/Qwen2.5-3B-Open-R1-GRPO-GGUF
3B • Updated • 6
binglinchengxia/Qwen-2.5-7B_Base_Math_smalllr
Text Generation
• 8B • Updated • 5
weltonwang88/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 2
gpandrad/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 2
sohyunan/gemma-2-2b-it_controller_sft_random_grpo
Text Generation
• 3B • Updated • 1