WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 91
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16, 2025 • 71
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Paper • 2509.13312 • Published Sep 16, 2025 • 105
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16, 2025 • 67
WebSailor: Navigating Super-human Reasoning for Web Agent Paper • 2507.02592 • Published Jul 3, 2025 • 123
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 390
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament Paper • 2501.13007 • Published Jan 22, 2025 • 19
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style Paper • 2410.16184 • Published Oct 21, 2024 • 25