Structured Packing in LLM Training Improves Long Context Utilization
Abstract
Recent advancements in long-context language modeling have attracted significant attention, yet their practical applications often suffer from suboptimal context utilization. To efficiently address this issue, we introduce the Structured Packing for Long Context, SPLiCe, a method that uses retrieval to collate mutually relevant documents into long training samples. We demonstrate that SPLiCe improves performance on long-context tasks, particularly by achieving perfect accuracy on the synthetic Needle in the Haystack benchmark, and effectively mitigating the ‘lost-in-the-middle’ phenomenon often observed in large language models. Notably, these long-context capabilities also extend to realistic downstream tasks, such as Qasper, across multiple model sizes—3B, 7B, and 13B—and are achieved with only brief fine-tuning on 2-6 billion tokens. We supplement these results with a detailed analysis of SPLiCe, examining the impact of hyperparameter choices, the different mixtures and proportions of SPLiCe-generated training data, and the choice of the retriever. We also study the transfer of long-context utilization skills between the modalities. An intriguing finding from our analysis is that training on a corpus of code can enhance performance on natural language tasks.
Cite
Text
Staniszewski et al. "Structured Packing in LLM Training Improves Long Context Utilization." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34706Markdown
[Staniszewski et al. "Structured Packing in LLM Training Improves Long Context Utilization." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/staniszewski2025aaai-structured/) doi:10.1609/AAAI.V39I24.34706BibTeX
@inproceedings{staniszewski2025aaai-structured,
title = {{Structured Packing in LLM Training Improves Long Context Utilization}},
author = {Staniszewski, Konrad and Tworkowski, Szymon and Jaszczur, Sebastian and Zhao, Yu and Michalewski, Henryk and Kucinski, Lukasz and Milos, Piotr},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {25201-25209},
doi = {10.1609/AAAI.V39I24.34706},
url = {https://mlanthology.org/aaai/2025/staniszewski2025aaai-structured/}
}