Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Abstract
Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our approach is the first that allows for reasoning about _joint_ uncertainty over transitions and rewards for optimistic exploration. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in environments with sparse rewards, action penalties, and difficult-to-explore regions. Furthermore, we provide insights into when optimism is beneficial and emphasize the critical role of model uncertainty in guiding exploration.
Cite
Text
Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.Markdown
[Bayrooti et al. "Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/)BibTeX
@inproceedings{bayrooti2025iclr-efficient,
title = {{Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling}},
author = {Bayrooti, Jasmine and Ek, Carl Henrik and Prorok, Amanda},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/bayrooti2025iclr-efficient/}
}