RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents Paper • 2606.19047 • Published 12 days ago • 4
OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification Paper • 2606.01476 • Published 29 days ago • 8
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 28 days ago • 235
PANDO: Efficient Multimodal AI Agents via Online Skill Distillation Paper • 2605.24785 • Published May 26 • 11
Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration Paper • 2605.17423 • Published May 17 • 34
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs Paper • 2505.11277 • Published May 16, 2025 • 29
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published May 27 • 431
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models Paper • 2605.05886 • Published May 7 • 3
PageGuide: Browser extension to assist users in navigating a webpage and locating information Paper • 2604.23772 • Published Apr 26 • 7
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published Apr 9 • 53
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published Apr 3 • 37