Dynamic Bottleneck for Robust Self-Supervised Exploration

Abstract

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards. However, such methods are usually sensitive to environmental dynamics-irrelevant information, e.g., white-noise. To handle such dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle. Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain. We establish theoretical connections between the proposed DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting count for tabular case. We evaluate the proposed method on Atari suits with dynamics-irrelevant noises. Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.

Cite

Text

Bai et al. "Dynamic Bottleneck for Robust Self-Supervised Exploration." Neural Information Processing Systems, 2021.

Markdown

[Bai et al. "Dynamic Bottleneck for Robust Self-Supervised Exploration." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/bai2021neurips-dynamic/)

BibTeX

@inproceedings{bai2021neurips-dynamic,
  title     = {{Dynamic Bottleneck for Robust Self-Supervised Exploration}},
  author    = {Bai, Chenjia and Wang, Lingxiao and Han, Lei and Garg, Animesh and Hao, Jianye and Liu, Peng and Wang, Zhaoran},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/bai2021neurips-dynamic/}
}