arxiv:2601.03822

ROI-Reasoning: Rational Optimization for Inference via Pre-Computation Meta-Cognition

Published on Jan 7

· Submitted by

HanbingLiu on Jan 8

Renmin University of China

Upvote

Authors:

Muyang Zhao ,

Abstract

Budgeted inference-time reasoning framework enables large language models to make strategic computational decisions by predicting costs and utilities before generation, then optimizing sequential allocation under strict token constraints through meta-cognitive fine-tuning and reinforcement learning.

AI-generated summary

Large language models (LLMs) can achieve strong reasoning performance with sufficient computation, but they do not inherently know how much computation a task requires. We study budgeted inference-time reasoning for multiple tasks under a strict global token constraint and formalize it as a Ordered Stochastic Multiple-Choice Knapsack Problem(OS-MCKP). This perspective highlights a meta-cognitive requirement -- anticipating task difficulty, estimating return over investment (ROI), and allocating computation strategically. We propose ROI-Reasoning, a two-stage framework that endows LLMs with intrinsic, budget-aware rationality. In the first stage, Meta-Cognitive Fine-Tuning teaches models to predict reasoning cost and expected utility before generation, enabling explicit solve-or-skip decisions. Next, Rationality-Aware Reinforcement Learning optimizes sequential decision making under a hard token budget, allowing models to learn long-horizon allocation strategies. Across budgeted mathematical reasoning benchmarks, ROI-Reasoning consistently improves overall score while substantially reducing regret under tight computation budgets.

View arXiv page View PDF Add to collection

Community

wangwang318

Paper author 1 day ago

ROI-Reasoning introduces a principled framework for budget-aware inference-time reasoning in large language models. Instead of blindly scaling computation, the authors formulate multi-task reasoning under a global token constraint as an Ordered Stochastic Multiple-Choice Knapsack Problem, explicitly modeling the trade-off between reasoning cost and expected utility. The proposed two-stage approach combines Meta-Cognitive Fine-Tuning, which enables models to anticipate difficulty and make solve-or-skip decisions before reasoning, with Rationality-Aware Reinforcement Learning, which optimizes long-horizon computation allocation under strict budgets. Across challenging mathematical reasoning benchmarks, ROI-Reasoning consistently improves total score and substantially reduces regret—demonstrating that meta-cognitive planning, not just stronger reasoning, is key to efficient test-time scaling of LLMs.