4 26 8

Penghui Qi

QPHutu

QPHutu

AI & ML interests

None yet

Recent Activity

liked a dataset about 1 month ago

LLM360/guru-RL-92k

liked a dataset about 1 month ago

zwhe99/DeepMath-103K

updated a dataset about 1 month ago

sail/Sanity-Test-R1D-1.5B

View all activity

Organizations

liked 2 datasets about 1 month ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20 • 91.9k • 2.49k • 40

zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 18.8k • 285

updated a dataset about 1 month ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15 • 1.52k • 62 • 6

liked a dataset about 1 month ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15 • 1.52k • 62 • 6

updated a collection about 1 month ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14

published a dataset about 1 month ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated Nov 15 • 1.52k • 62 • 6

updated a collection about 1 month ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14

liked a model about 2 months ago

zz1358m/SofT-GRPO-master

Updated Nov 13 • 7

upvoted a paper about 2 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5 • 127

authored a paper about 2 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30 • 29

upvoted a paper about 2 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30 • 29

commented a paper about 2 months ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30 • 29 •

upvoted 2 papers 3 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 70

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 69

liked a dataset 3 months ago

SynthLabsAI/Big-Math-RL-Verified

Viewer • Updated Mar 25 • 251k • 5.18k • 213

upvoted 3 papers 4 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 227

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 76

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

updated a collection 5 months ago

LLM Agent

Collection

4 items • Updated Aug 4

upvoted a paper 6 months ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2 • 56

Penghui Qi

AI & ML interests

Recent Activity

Organizations

QPHutu's activity