Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Abstract

Both surprise-minimizing and surprise-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method can perform well across all entropy regimes. In an effort to find a single surprise-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective depending on the entropy conditions in its environment by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes.

Cite

Text

Hugessen et al. "Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning." NeurIPS 2023 Workshops: IMOL, 2023.

Markdown

[Hugessen et al. "Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning." NeurIPS 2023 Workshops: IMOL, 2023.](https://mlanthology.org/neuripsw/2023/hugessen2023neuripsw-surpriseadaptive/)

BibTeX

@inproceedings{hugessen2023neuripsw-surpriseadaptive,
  title     = {{Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning}},
  author    = {Hugessen, Adriana and Castanyer, Roger Creus and Berseth, Glen},
  booktitle = {NeurIPS 2023 Workshops: IMOL},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/hugessen2023neuripsw-surpriseadaptive/}
}