Inference Providers
Active filters: grpo
Text Generation
• Updated • 25
• peulsilva/reasoning-qwen-epoch0
Text Generation
• 0.5B • Updated • 2
• peulsilva/reasoning-qwen-epoch1
Text Generation
• 0.5B • Updated • 2
spinech/qwen2.5-3b-r1-arc-train-synthetic
Text Generation
• 3B • Updated • 5
peulsilva/reasoning-qwen-epoch2
Text Generation
• 0.5B • Updated • 3
• Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math
Text Generation
• 8B • Updated • 7
Text Generation
• 8B • Updated • 2
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO_Math
Text Generation
• 2B • Updated • 2
Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math
Text Generation
• 2B • Updated • 2
peulsilva/reasoning-qwen-epoch3
Text Generation
• 0.5B • Updated • 2
mradermacher/DeepSeek-R1-Distill-Qwen-7B-GRPO-GGUF
8B • Updated • 198
skzxjus/Qwen2.5-7B-Open-R1-GRPO
Text Generation
• 8B • Updated • 6
AndreasX1206/Qwen2-0.5B-countdown
Text Generation
• 0.5B • Updated • 3
• mradermacher/Qwen-0.5B-GRPO-GGUF
0.5B • Updated • 69
alicogniai/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 3
ununtrium/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 3
mradermacher/DeepSeek-R1-Distill-Qwen-7B-GRPO-i1-GGUF
8B • Updated • 324
yuta0x89/llmjp13b-numinacot-epoch2-GRPO
Text Generation
• 14B • Updated • 7
yeshsurya/Qwen2.5-7B-Math-with_50stepGRPO
Text Generation
• 8B • Updated • 9
mradermacher/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math-GGUF
2B • Updated • 107
mradermacher/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math-GGUF
8B • Updated • 127
mradermacher/DeepSeek-R1-Qwen-2.5-1.5b-Latest-Unstructured-To-Structured-GGUF
2B • Updated • 130
• 1
hyunw3/qwen-2.5-0.5b-r1-countdown_lr5e-6
Text Generation
• 0.5B • Updated • 4
• khuang2/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 7
• 2
spinech/qwen2.5-3b-r1-arc-train-thinker
Text Generation
• 3B • Updated • 5
• 1
Dongwei/DeepSeek-R1-Distill-Qwen-7B-GRPO_Math_lowlr
Text Generation
• 8B • Updated • 2
Dongwei/Qwen-2.5-7B_Math_smalllr
Text Generation
• 8B • Updated • 6
Dongwei/Qwen2.5-1.5B-Open-R1-GRPO_Math_smalllr
Text Generation
• 2B • Updated • 3
Dongwei/DeepSeek-R1-Distill-Qwen-1.5B-GRPO_Math_smalllr
Text Generation
• 2B • Updated • 3
mradermacher/Qwen2.5-1.5B-Thinking-v1.1-GGUF
2B • Updated • 112
• 2