view article Article Red Teaming with RL: Exploiting Tinker API for Harmful RL on 235B Model 9 days ago • 15
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published Mar 10, 2025 • 88