Papers
arxiv:2601.03822

ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition

Published on Jan 7
· Submitted by
HanbingLiu
on Jan 8
Authors:
,

Abstract

Budgeted inference-time reasoning framework enables large language models to make strategic computational decisions by predicting costs and utilities before generation, then optimizing sequential allocation under strict token constraints through meta-cognitive fine-tuning and reinforcement learning.

AI-generated summary

Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a hard token budget, allowing models to learn long-horizon allocation strategies. Across budgeted mathematical reasoning benchmarks, ROI-Reasoning consistently improves overall score while substantially reducing regret under tight computation budgets.

Community

Paper author

ROI-Reasoning introduces a principled framework for budget-aware inference-time reasoning in large language models. Instead of blindly scaling computation, the authors formulate multi-task reasoning under a global token constraint as an Ordered Stochastic Multiple-Choice Knapsack Problem, explicitly modeling the trade-off between reasoning cost and expected utility. The proposed two-stage approach combines Meta-Cognitive Fine-Tuning, which enables models to anticipate difficulty and make solve-or-skip decisions before reasoning, with Rationality-Aware Reinforcement Learning, which optimizes long-horizon computation allocation under strict budgets. Across challenging mathematical reasoning benchmarks, ROI-Reasoning consistently improves total score and substantially reduces regret—demonstrating that meta-cognitive planning, not just stronger reasoning, is key to efficient test-time scaling of LLMs.

Paper submitter
This comment has been hidden
Paper submitter

A cool paper about budget-aware inference-time reasoning in LLMs!

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.03822 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.03822 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.03822 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.