Language Models Are Better than Humans at Next-Token Prediction

Abstract

Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, causal language models are not trained to perform well at these tasks; they are trained to accurately predict the next token given previous tokens in tokenized text. It is not clear whether language models are better or worse than humans at next-token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity on OpenWebText. In both experiments, we find humans to be consistently \emph{worse} than relatively small language models like GPT-Neo-1.3B or GPT-2-large at next-token prediction.

Cite

Text

Shlegeris et al. "Language Models Are Better than Humans at Next-Token Prediction." Transactions on Machine Learning Research, 2024.

Markdown

[Shlegeris et al. "Language Models Are Better than Humans at Next-Token Prediction." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/shlegeris2024tmlr-language/)

BibTeX

@article{shlegeris2024tmlr-language,
  title     = {{Language Models Are Better than Humans at Next-Token Prediction}},
  author    = {Shlegeris, Buck and Roger, Fabien and Chan, Lawrence and McLean, Euan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/shlegeris2024tmlr-language/}
}