K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 15 days ago • 56
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 80
KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context Paper • 2604.13058 • Published Mar 18 • 2