15 45 81

Seungone Kim PRO

seungone

https://seungonekim.github.io/

AI & ML interests

Large Language Models, LLM-as-a-Judge, Reward Model Overoptimization, Personalized Alignment

Recent Activity

authored a paper 2 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

authored a paper 2 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

upvoted a paper 5 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

View all activity

Organizations

authored 2 papers 2 days ago

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

Paper • 2605.26457 • Published 13 days ago • 6

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 7 days ago • 53

upvoted a paper 5 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 7 days ago • 53

updated a dataset 5 days ago

prometheus-eval/k-browsecomp

Viewer • Updated 5 days ago • 700 • 775 • 6

submitted a paper to Daily Papers 5 days ago

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Paper • 2606.02404 • Published 7 days ago • 53

liked a dataset 6 days ago

prometheus-eval/k-browsecomp

Viewer • Updated 5 days ago • 700 • 775 • 6

liked a dataset 11 days ago

prometheus-eval/peerreview-bench

Viewer • Updated 11 days ago • 27.4k • 232 • 1

updated a dataset 11 days ago

prometheus-eval/peerreview-bench

Viewer • Updated 11 days ago • 27.4k • 232 • 1

authored a paper 17 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 19 days ago • 12

upvoted a paper 18 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 19 days ago • 12

submitted a paper to Daily Papers 18 days ago

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Paper • 2605.20668 • Published 19 days ago • 12

upvoted a paper 24 days ago

VibeProteinBench: An Evaluation Benchmark for Language-interfaced Vibe Protein Design

Paper • 2605.10978 • Published 26 days ago • 19

authored 2 papers 25 days ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 30 days ago • 80

upvoted a paper 27 days ago

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

Paper • 2605.09063 • Published 30 days ago • 80

published a dataset about 2 months ago

prometheus-eval/peerreview-bench

Viewer • Updated 11 days ago • 27.4k • 232 • 1

upvoted a paper 3 months ago

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

Paper • 2603.18886 • Published Mar 19 • 6

authored 3 papers 5 months ago

Measuring Sycophancy of Language Models in Multi-turn Dialogues

Paper • 2505.23840 • Published May 28, 2025 • 3

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

Paper • 2507.00432 • Published Jul 1, 2025 • 79

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

Paper • 2508.13141 • Published Aug 18, 2025

Seungone Kim PRO

AI & ML interests

Recent Activity

Organizations

seungone's activity