Inference Providers
Active filters: grpo
Text Generation
• Updated • 1
chirag12/Qwen2-0.5B-GRPO-test
Updated
bushou/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 2B • Updated • 1
umarigan/llama-3.2-8B-R1-Tr
Text Generation
• 8B • Updated • 2
• 1
valerielucro/Qwen2-0.5B-GRPO_dummy
Text Generation
• 2.43M • Updated • 1
fhai50032/Qwen2.5-GRPO-7B
Text Generation
• Updated • 658
hyunseoki/llama3.2-1b-Open-R1-GRPO-test0
Text Generation
• 1B • Updated • 6
• • 1
saswatach/Qwen2-0.5B-GRPO-test
Updated
pmking27/SaishamMathLM-3B-R1
Text Generation
• Updated • 2
BleachNick/Llama-3.2-1B-Instruct-GRPO-45k_RAGv1.5
Text Generation
• 1B • Updated • 1
• saswatach/qwen-r1-aha-moment
Updated
januverma/QwenMath0.5B_GRPO
Text Generation
• 0.5B • Updated • 2
• 1
chinmaydk99/Qwen2.5-0.5b-GRPO-math
Text Generation
• 0.5B • Updated • 11
• • 1
kenhktsui/Qwen-0.5B-GRPO-gsm8k-count-wait-cap-cross-correct
Text Generation
• 0.5B • Updated • 2
• suayptalha/ThinkerLlama-8B-v1
Text Generation
• 8B • Updated • 5
• 3
Trojanssafsdg/phi4_merged_16bit
Text Generation
• 15B • Updated • 5
totalyielddot/llama3.1_reasoning
Text Generation
• 8B • Updated • 2
mradermacher/Qwen2.5-1.5B-R1-GRPO-GGUF
2B • Updated • 30
jayasuryajsk/Qwen2.5-3B-reasoner
Text Generation
• 3B • Updated • 3
pashocles/qwen-2.5-3b-r1-countdown
Text Generation
• 3B • Updated • 6
mesbahuddin1989/Qwen-0.5B-GRPO
Text Generation
• 0.5B • Updated • 2
hooman650/MedQwen3B-Reasoner
Text Generation
• 3B • Updated • 115
• 13
Text Generation
• 15B • Updated • 1
NowaBwagel0/Llama3.2_3B-GRPO
Text Generation
• Updated • 3
msp5382/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
Text Generation
• 2B • Updated • 2
khuang2/qwen-2.5-3b-r1-countdown-train_query_and_policy_v8__steps_450__bs_56__lr_5e7__seed_42
Text Generation
• 3B • Updated • 1
khuang2/qwen-2.5-3b-r1-countdown-train_query_and_policy_v9__steps_450__bs_56__lr_5e7__seed_42
Text Generation
• 3B • Updated • 2
tenacioustommy/Qwen2.5-1.5B-Open-R1-GRPO
Text Generation
• 3B • Updated • 2
khuang2/qwen-2.5-3b-r1-countdown_v1__steps_450__bs_224__lr_5e7__seed_42
Text Generation
• 3B • Updated • 1