Generalizing Reasoning Problems to Longer Lengths

Abstract

Length generalization (LG) is a challenging problem in learning to reason. It refers to the phenomenon that when trained on reasoning problems of smaller lengths/sizes, the model struggles with problems of larger sizes or lengths. Although it has been proven that reasoning can be learned if the intermediate reasoning steps (also known as chain-of-thought (CoT)) are given in the training data, existing studies only apply to within a given length (interpolation), while LG is about extrapolation beyond the given length. This paper begins by presenting a theorem that identifies the root cause of the LG problem. It then defines a class of reasoning problems for which achieving LG with Transformers can be theoretically guaranteed, provided the CoT schemes are constructed to meet a proposed condition called $(n,r)$-consistency.

Cite

Text

Xiao and Liu. "Generalizing Reasoning Problems to Longer Lengths." International Conference on Learning Representations, 2025.

Markdown

[Xiao and Liu. "Generalizing Reasoning Problems to Longer Lengths." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/xiao2025iclr-generalizing/)

BibTeX

@inproceedings{xiao2025iclr-generalizing,
  title     = {{Generalizing Reasoning Problems to Longer Lengths}},
  author    = {Xiao, Changnan and Liu, Bing},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/xiao2025iclr-generalizing/}
}