EntropyLong: Effective Long-Context Training via Predictive Uncertainty

Jia, Junlong; Chen, Ziyang; W, Xing; Gao, Chaochen; Lin, Zijia; Hu, Songlin; GuoBinghui,

EntropyLong: Effective Long-Context Training via Predictive Uncertainty

Junlong Jia, Ziyang Chen, Xing W, Chaochen Gao, Zijia Lin, Songlin Hu, GuoBinghui

ICLR 2026

/iclr/2026/jia2026iclr-entropylong/

Abstract

Training long-context language models to capture long-range dependencies requires specialized data construction. Current approaches, such as generic text concatenation or heuristic-based variants, frequently fail to guarantee genuine long-range dependencies. We propose \textbf{EntropyLong}, a novel data construction method that leverages predictive uncertainty to verify dependency quality. Our approach identifies high-entropy positions in documents, retrieves semantically relevant contexts from large corpora, and verifies their utility by assessing whether they reduce prediction entropy. This \textit{model-in-the-loop verification} ensures each dependency represents measurable information gain rather than spurious correlation. We construct training samples with long-range dependencies by combining original documents with these verified contextual supplements. Using FineWeb-Edu and Cosmopedia, we generate a dataset of 128K-length sequences with verified dependencies. Models trained on this data demonstrate significant improvements on RULER benchmarks, particularly in tasks requiring distant information. Following instruction fine-tuning, our models also achieve substantial gains on LongBench-v2, demonstrating enhanced long-context understanding. Extensive ablation studies further validate the necessity and effectiveness of entropy-based verification for long-context training.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Jia et al. "EntropyLong: Effective Long-Context Training via Predictive Uncertainty." International Conference on Learning Representations, 2026.

Markdown

[Jia et al. "EntropyLong: Effective Long-Context Training via Predictive Uncertainty." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jia2026iclr-entropylong/)

BibTeX

@inproceedings{jia2026iclr-entropylong,
  title     = {{EntropyLong: Effective Long-Context Training via Predictive Uncertainty}},
  author    = {Jia, Junlong and Chen, Ziyang and W, Xing and Gao, Chaochen and Lin, Zijia and Hu, Songlin and GuoBinghui, },
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/jia2026iclr-entropylong/}
}