WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks Paper • 2601.02439 • Published 4 days ago • 12
VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation Paper • 2510.23497 • Published Oct 27, 2025 • 1
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 29
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 97
The Unreasonable Effectiveness of Scaling Agents for Computer Use Paper • 2510.02250 • Published Oct 2, 2025 • 24
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published Apr 1, 2025 • 27
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 59
The Prompt Report: A Systematic Survey of Prompting Techniques Paper • 2406.06608 • Published Jun 6, 2024 • 68
FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models Paper • 2403.07747 • Published Mar 12, 2024 • 1
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published May 1, 2024 • 32
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14, 2024 • 26
ResLoRA: Identity Residual Mapping in Low-Rank Adaption Paper • 2402.18039 • Published Feb 28, 2024 • 11