An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding

Abstract

Recently, many methods have been developed to extend the context length of pre-trained large language models (LLMs), but they often require fine-tuning at the target length ($\gg4K$) and struggle to effectively utilize information from the middle part of the context. To address these issues, we propose $\textbf{C}$ontinuity-$\textbf{R}$elativity ind$\textbf{E}$xing with g$\textbf{A}$ussian $\textbf{M}$iddle ($\texttt{CREAM}$), which interpolates positional encodings by manipulating position indices. Apart from being simple, $\texttt{CREAM}$ is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K). To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the ''Lost-in-the-Middle'' problem faced by long-context LLMs. Experimental results show that $\texttt{CREAM}$ successfully extends LLMs to the target length for both Base and Chat versions of $\texttt{Llama2-7B}$ with ``Never Miss A Beat''. Our code is publicly available at https://github.com/bigai-nlco/cream.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wu et al. "An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding." Neural Information Processing Systems, 2024. doi:10.52202/079017-1794

Markdown

[Wu et al. "An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wu2024neurips-efficient/) doi:10.52202/079017-1794

BibTeX

@inproceedings{wu2024neurips-efficient,
  title     = {{An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding}},
  author    = {Wu, Tong and Zhao, Yanpeng and Zheng, Zilong},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1794},
  url       = {https://mlanthology.org/neurips/2024/wu2024neurips-efficient/}
}