Language Models Are Better than Humans at Next-Token Prediction
Abstract
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, causal language models are not trained to perform well at these tasks; they are trained to accurately predict the next token given previous tokens in tokenized text. It is not clear whether language models are better or worse than humans at next-token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity on OpenWebText. In both experiments, we find humans to be consistently \emph{worse} than relatively small language models like GPT-Neo-1.3B or GPT-2-large at next-token prediction.
Cite
Text
Shlegeris et al. "Language Models Are Better than Humans at Next-Token Prediction." Transactions on Machine Learning Research, 2024.Markdown
[Shlegeris et al. "Language Models Are Better than Humans at Next-Token Prediction." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/shlegeris2024tmlr-language/)BibTeX
@article{shlegeris2024tmlr-language,
title = {{Language Models Are Better than Humans at Next-Token Prediction}},
author = {Shlegeris, Buck and Roger, Fabien and Chan, Lawrence and McLean, Euan},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/shlegeris2024tmlr-language/}
}