Efficient Exploration for LLMs

Abstract

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models. In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received. Our best-performing agent generates queries using double Thompson sampling, with uncertainty represented by an epistemic neural network. Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries. Further, both uncertainty estimation and the choice of exploration scheme play critical roles.

Cite

Text

Dwaracherla et al. "Efficient Exploration for LLMs." International Conference on Machine Learning, 2024.

Markdown

[Dwaracherla et al. "Efficient Exploration for LLMs." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/dwaracherla2024icml-efficient/)

BibTeX

@inproceedings{dwaracherla2024icml-efficient,
  title     = {{Efficient Exploration for LLMs}},
  author    = {Dwaracherla, Vikranth and Asghari, Seyed Mohammad and Hao, Botao and Van Roy, Benjamin},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {12215-12227},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/dwaracherla2024icml-efficient/}
}