The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 45
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs Paper • 2506.15211 • Published Jun 18, 2025 • 38
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16, 2025 • 60
Efficient Inference for Large Reasoning Models: A Survey Paper • 2503.23077 • Published Mar 29, 2025 • 46