Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs

Liu, Yihe; Wang, Huibin; Hu, Xianming; Zhang, Pinyi; Xiong, Jiahao; Wang, Chenglin; Chen, Nuoyi; Zhao, Hongbo; Zhang, Jie; Zhang, Kai

Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs

Yihe Liu, Huibin Wang, Xianming Hu, Pinyi Zhang, Jiahao Xiong, Chenglin Wang, Nuoyi Chen, Hongbo Zhao, Jie Zhang, Kai Zhang

ICLR 2026

/iclr/2026/liu2026iclr-nexttobe/

Abstract

Auto-regressive large language models (LLMs) exhibit a non-trivial capacity to "anticipate'' long-range future tokens despite being trained to predict only one token at a time. Nevertheless, how to systematically profile, enhance and leverage such capacity to practically improve LLM reasoning performance remains unclear. In this paper, we propose **Next Token-Bag Exploitation (Next-ToBE)** to tackle this challenge. Next-ToBE quantifies LLM’s anticipatory capacity by measuring how well tokens in the future window are pre-captured by the model’s current softmax probabilities. This capacity is strongly correlated with LLM generative quality but often suppressed by the rigid one-hot objective in next-token prediction. To address this, we replace the one-hot target vector in next-token prediction with a soft target distribution spanning additional future tokens. Specifically, the immediate next token retains the highest importance, while more distant ``look-ahead tokens'' are also included to enrich supervision, with their importance dynamically determined by temporal and semantic relevance patterns to inject forward-looking pressure. Besides, the fitting process emphasizes the model’s intrinsic anticipatory tendency, thus preserving the confidence and fidelity of the pre-trained model to improve training stability. Overall, Next-ToBE not only effectively activates LLM anticipatory capacity through fine-tuning, yielding notable gains in reasoning performance with higher memory and computational efficiency against the MTP baselines, but also shows great potential in pretraining setting by successfully cultivating this capacity from scratch. These highlight its value as an effective strategy to extend the prediction horizon of LLMs, enabling them to see further, and reason better.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Liu et al. "Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-nexttobe/)

BibTeX

@inproceedings{liu2026iclr-nexttobe,
  title     = {{Next-ToBE: Probabilistic Next Token-Bag Exploitation for Activating Anticipatory Capacity in LLMs}},
  author    = {Liu, Yihe and Wang, Huibin and Hu, Xianming and Zhang, Pinyi and Xiong, Jiahao and Wang, Chenglin and Chen, Nuoyi and Zhao, Hongbo and Zhang, Jie and Zhang, Kai},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-nexttobe/}
}