How Long Can Context Length of Open-Source LLMs Truly Promise?

Abstract

Large language models (LLMs) with long-context instruction following ability has unlocked new potentials, such as supporting long interactive chat sessions. In this paper, we introduce a test suite, LongEval, which enables us to evaluate the long-range retrieval ability of LLMs at various context lengths. We use LongEval to evaluate open-sourced LLMs, and surprisingly, we find many of them fail to achieve their promised context length. In addition, we present a recipe to fine tune a long-context chatbot based on LLaMA models, and introduce LongChat models that supporting conversations of up to 16,384 tokens. We have released our code at https://github.com/DachengLi1/LongChat.

Cite

Text

Li et al. "How Long Can Context Length of Open-Source LLMs Truly Promise?." NeurIPS 2023 Workshops: Instruction, 2023.

Markdown

[Li et al. "How Long Can Context Length of Open-Source LLMs Truly Promise?." NeurIPS 2023 Workshops: Instruction, 2023.](https://mlanthology.org/neuripsw/2023/li2023neuripsw-long/)

BibTeX

@inproceedings{li2023neuripsw-long,
  title     = {{How Long Can Context Length of Open-Source LLMs Truly Promise?}},
  author    = {Li, Dacheng and Shao, Rulin and Xie, Anze and Sheng, Ying and Zheng, Lianmin and Gonzalez, Joseph and Stoica, Ion and Ma, Xuezhe and Zhang, Hao},
  booktitle = {NeurIPS 2023 Workshops: Instruction},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/li2023neuripsw-long/}
}