LongAlign: A Recipe for Long Context Alignment of Large Language Models

Abstract

Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length. To address this, we present LongAlign---a recipe of the instruction data, training, and evaluation for long context alignment. First, we construct a long instruction-following dataset using Self-Instruct. To ensure the data diversity, it covers a broad range of tasks from various long context sources. Second, we adopt the packing and sorted batching strategies to speed up supervised fine-tuning on data with varied length distributions. Additionally, we develop a loss weighting method to balance the contribution to the loss across different sequences during packing training. Third, we introduce the LongBench-Chat benchmark for evaluating instruction-following capabilities on queries of 10k-100k in length. Experiments show that LongAlign outperforms existing recipes for LLMs in long context tasks by up to 30%, while also maintaining their proficiency in handling short, generic tasks.

Cite

Text

Bai et al. "LongAlign: A Recipe for Long Context Alignment of Large Language Models." ICML 2024 Workshops: LCFM, 2024.

Markdown

[Bai et al. "LongAlign: A Recipe for Long Context Alignment of Large Language Models." ICML 2024 Workshops: LCFM, 2024.](https://mlanthology.org/icmlw/2024/bai2024icmlw-longalign/)

BibTeX

@inproceedings{bai2024icmlw-longalign,
  title     = {{LongAlign: A Recipe for Long Context Alignment of Large Language Models}},
  author    = {Bai, Yushi and Lv, Xin and Zhang, Jiajie and He, Yuze and Qi, Ji and Hou, Lei and Tang, Jie and Dong, Yuxiao and Li, Juanzi},
  booktitle = {ICML 2024 Workshops: LCFM},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/bai2024icmlw-longalign/}
}