ST Projects

AI & ML interests

None defined yet.

Recent Activity

violetxi authored a paper about 22 hours ago

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

violetxi authored a paper about 22 hours ago

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

violetxi authored a paper about 22 hours ago

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing

View all activity

authored 4 papers about 22 hours ago

ResearchCodeBench: Benchmarking LLMs on Implementing Novel Machine Learning Research Code

Paper • 2506.02314 • Published Jun 2, 2025

Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning

Paper • 2506.05256 • Published Jun 5, 2025 • 2

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing

Paper • 2507.00769 • Published Jul 1, 2025 • 5

ExpRL: Exploratory RL for LLM Mid-Training

Paper • 2606.17024 • Published 15 days ago • 5

submitted a paper to Daily Papers 14 days ago

ExpRL: Exploratory RL for LLM Mid-Training

Paper • 2606.17024 • Published 15 days ago • 5

updated a dataset 10 months ago

st-projects/llm-judge-n10

Viewer • Updated Sep 13, 2025 • 20 • 2

published a dataset 10 months ago

st-projects/llm-judge-n10

Viewer • Updated Sep 13, 2025 • 20 • 2

updated a dataset 10 months ago

st-projects/quick-check

Viewer • Updated Sep 12, 2025 • 6 • 2

published a dataset 10 months ago

st-projects/quick-check

Viewer • Updated Sep 12, 2025 • 6 • 2

updated a dataset 10 months ago

st-projects/llm-judge-eval-n5-new-prompt

Viewer • Updated Sep 10, 2025 • 10 • 2

published a dataset 10 months ago

st-projects/llm-judge-eval-n5-new-prompt

Viewer • Updated Sep 10, 2025 • 10 • 2

updated a dataset 10 months ago

st-projects/llm-judge-evaluation-sample-size-5

Viewer • Updated Sep 6, 2025 • 10 • 2

published a dataset 10 months ago

st-projects/llm-judge-evaluation-sample-size-5

Viewer • Updated Sep 6, 2025 • 10 • 2

updated 2 datasets 10 months ago

st-projects/NuminaMath_ProofsOnlyNoLinks

Viewer • Updated Sep 6, 2025 • 75.6k • 3

st-projects/NaturalProofs_train_NoURLsOrFiles

Viewer • Updated Sep 5, 2025 • 12k • 2

published 2 datasets 10 months ago

st-projects/NaturalProofs_train_NoURLsOrFiles

Viewer • Updated Sep 5, 2025 • 12k • 2

st-projects/NuminaMath_ProofsOnlyNoLinks

Viewer • Updated Sep 6, 2025 • 75.6k • 3

authored 2 papers over 1 year ago

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Paper • 2407.07086 • Published Jul 9, 2024

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8, 2025 • 99