rr123

raymond1113

AI & ML interests

None yet

Recent Activity

liked a dataset about 2 hours ago

lighteval/code_generation_lite

upvoted a paper 4 months ago

Scaling Agents via Continual Pre-training

upvoted a paper 4 months ago

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

View all activity

Organizations

None yet

liked a dataset about 2 hours ago

lighteval/code_generation_lite

Viewer • Updated Aug 15, 2025 • 12.8k • 6.01k • 2

upvoted 5 papers 4 months ago

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16, 2025 • 117

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16, 2025 • 91

Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 71

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Paper • 2509.13312 • Published Sep 16, 2025 • 105

WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents

Paper • 2509.13309 • Published Sep 16, 2025 • 67

upvoted a collection 6 months ago

Qwen3

Collection

84 items • Updated 10 days ago • 1.55k

upvoted a paper 6 months ago

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3, 2025 • 123

liked 4 datasets 9 months ago

liked a model 9 months ago

qqqzzzyyy/qwen2.5-1.5b-simple-rl-math3to5-adaptive_s4

Updated Apr 14, 2025 • 1

upvoted a collection 10 months ago

DeepSeek-R1

Collection

10 items • Updated Nov 27, 2025 • 827

upvoted an article 11 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

390

upvoted a paper 12 months ago

Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament

Paper • 2501.13007 • Published Jan 22, 2025 • 19

upvoted a paper about 1 year ago

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Paper • 2410.16184 • Published Oct 21, 2024 • 25

rr123

AI & ML interests

Recent Activity

Organizations

raymond1113's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)