TokenSkip: Controllable Chain-of-Thought Compression in LLMs Paper • 2502.12067 • Published Feb 17 • 3
The Station: An Open-World Environment for AI-Driven Discovery Paper • 2511.06309 • Published Nov 9 • 36
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 70
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2 • 83
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs Paper • 2412.16855 • Published Dec 22, 2024 • 5
How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation Paper • 2502.14642 • Published Feb 20 • 1
PEToolLLM: Towards Personalized Tool Learning in Large Language Models Paper • 2502.18980 • Published Feb 26
TokenSkip: Controllable Chain-of-Thought Compression in LLMs Paper • 2502.12067 • Published Feb 17 • 3
Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences? Paper • 2502.13925 • Published Feb 19
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors Paper • 2502.13311 • Published Feb 18 • 2
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region Paper • 2502.13946 • Published Feb 19 • 10