Language Models as Recommender Systems: Evaluations and Limitations

Abstract

Pre-trained language models (PLMs) such as BERT and GPT learn general text representations and encode extensive world knowledge; thus, they can efficiently and accurately adapt to various downstream tasks. In this work, we propose to leverage these powerful PLMs as recommender systems and use prompts to reformulate the session-based recommendation task to a multi-token cloze task. We evaluate the proposed method on a movie recommendation dataset in zero-shot and fine-tuned settings where no or limited training data are available. In the zero-shot setting: we find that PLMs outperform the random recommendation baseline by a large margin; in the meantime, we observe strong linguistic bias when using PLMs as recommenders. In the fine-tuned setting: such bias is reduced with available training data; however, PLMs tend to under-perform traditional recommender system baselines such as GRU4Rec. Our observations demonstrate potential opportunities as well as current challenges in this novel direction.

Cite

Text

Zhang et al. "Language Models as Recommender Systems: Evaluations and Limitations." NeurIPS 2021 Workshops: ICBINB, 2021.

Markdown

[Zhang et al. "Language Models as Recommender Systems: Evaluations and Limitations." NeurIPS 2021 Workshops: ICBINB, 2021.](https://mlanthology.org/neuripsw/2021/zhang2021neuripsw-language/)

BibTeX

@inproceedings{zhang2021neuripsw-language,
  title     = {{Language Models as Recommender Systems: Evaluations and Limitations}},
  author    = {Zhang, Yuhui and Ding, Hao and Shui, Zeren and Ma, Yifei and Zou, James and Deoras, Anoop and Wang, Hao},
  booktitle = {NeurIPS 2021 Workshops: ICBINB},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/zhang2021neuripsw-language/}
}