Structured Packing in LLM Training Improves Long Context Utilization

Staniszewski, Konrad; Tworkowski, Szymon; Jaszczur, Sebastian; Zhao, Yu; Michalewski, Henryk; Kucinski, Lukasz; Milos, Piotr

doi:10.1609/AAAI.V39I24.34706

Structured Packing in LLM Training Improves Long Context Utilization

Konrad Staniszewski, Szymon Tworkowski, Sebastian Jaszczur, Yu Zhao, Henryk Michalewski, Lukasz Kucinski, Piotr Milos

AAAI 2025 pp. 25201-25209

doi:10.1609/AAAI.V39I24.34706 /aaai/2025/staniszewski2025aaai-structured/

Abstract

Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. We demonstrate that SPLiCe improves performance on long-context tasks, particularly by achieving perfect accuracy on the synthetic Needle in the Haystack benchmark, and effectively mitigating the ‘lost-in-the-middle’ phenomenon often observed in large language models. Notably, these long-context capabilities also extend to realistic downstream tasks, such as Qasper, across multiple model sizes—3B, 7B, and 13B—and are achieved with only brief fine-tuning on 2-6 billion tokens. We supplement these results with a detailed analysis of SPLiCe, examining the impact of hyperparameter choices, the different mixtures and proportions of SPLiCe-generated training data, and the choice of the retriever. We also study the transfer of long-context utilization skills between the modalities. An intriguing finding from our analysis is that training on a corpus of code can enhance performance on natural language tasks.

PDF AAAI Semantic Scholar

Cite

Text

Staniszewski et al. "Structured Packing in LLM Training Improves Long Context Utilization." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34706

Markdown

[Staniszewski et al. "Structured Packing in LLM Training Improves Long Context Utilization." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/staniszewski2025aaai-structured/) doi:10.1609/AAAI.V39I24.34706

BibTeX

@inproceedings{staniszewski2025aaai-structured,
  title     = {{Structured Packing in LLM Training Improves Long Context Utilization}},
  author    = {Staniszewski, Konrad and Tworkowski, Szymon and Jaszczur, Sebastian and Zhao, Yu and Michalewski, Henryk and Kucinski, Lukasz and Milos, Piotr},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25201-25209},
  doi       = {10.1609/AAAI.V39I24.34706},
  url       = {https://mlanthology.org/aaai/2025/staniszewski2025aaai-structured/}
}