Thomas: Learning to Explore Human Preference via Probabilistic Reward Model

Sang T. Truong, Duc Quang Nguyen, Tho Quan, Sanmi Koyejo

ICMLW 2023

/icmlw/2023/truong2023icmlw-thomas/

Abstract

Recent breakthroughs in large language models and multimodal models underscore the impressive strides deep learning has made in tackling sophisticated tasks previously deemed achievable solely by humans. In particular, discerning human thoughts or interests via communication and feedback is garnering attention for its potential to enable machines to provide insightful responses or recommendations. Nonetheless, despite progressive developments, preference learning from human feedback is hindered by poor sample complexity, as it primarily employs preferred responses for tuning, consequently failing to holistically capture user preferences. Moreover, it is imperative to ensure diversity in the responses generated, as this diversity is instrumental in enabling users to ascertain their genuine preferences, which in turn, is conducive to the fine-tuning of the response generation model. In this study, we introduce a novel method known as Thomas, which utilizes Bayesian neural networks for capturing user preferences, and Thompson sampling to enhance the exploration ability of the response generation model. This synergy ensures alignment of generated responses with user preferences, while preserving diversity, thus expediting the learning process. Experimental evaluations in synthetic environments affirm the proficiency of our method in swiftly adapting to user preferences and generating increasingly favored responses.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Truong et al. "Thomas: Learning to Explore Human Preference via Probabilistic Reward Model." ICML 2023 Workshops: MFPL, 2023.

Markdown

[Truong et al. "Thomas: Learning to Explore Human Preference via Probabilistic Reward Model." ICML 2023 Workshops: MFPL, 2023.](https://mlanthology.org/icmlw/2023/truong2023icmlw-thomas/)

BibTeX

@inproceedings{truong2023icmlw-thomas,
  title     = {{Thomas: Learning to Explore Human Preference via Probabilistic Reward Model}},
  author    = {Truong, Sang T. and Nguyen, Duc Quang and Quan, Tho and Koyejo, Sanmi},
  booktitle = {ICML 2023 Workshops: MFPL},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/truong2023icmlw-thomas/}
}