Constrained Contrastive Reinforcement Learning

Abstract

Learning to control from complex observations remains a major challenge in the application of model-based reinforcement learning (MBRL). Existing MBRL methods apply contrastive learning to replace pixel-level reconstruction, improving the performance of the latent world model. However, previous contrastive learning approaches in MBRL fail to utilize task-relevant information, making it difficult to aggregate observations with the same task-relevant information but the different task-irrelevant information in latent space. In this work, we first propose Constrained Contrastive Reinforcement Learning (C2RL), an MBRL method that learns a world model through a combination of two contrastive losses based on latent dynamics and task-relevant state abstraction respectively, utilizing reward information to accelerate model learning. Then, we propose a hyperparameter $\beta$ to balance two kinds of contrastive losses to strengthen the representation ability of the latent dynamics. The experimental results show that our approach outperforms state-of-the-art methods in both the natural video and standard background setting on challenging DMControl tasks.

Cite

Text

Wang et al. "Constrained Contrastive Reinforcement Learning." Proceedings of The 14th Asian Conference on Machine Learning, 2022.

Markdown

[Wang et al. "Constrained Contrastive Reinforcement Learning." Proceedings of The 14th Asian Conference on Machine Learning, 2022.](https://mlanthology.org/acml/2022/wang2022acml-constrained/)

BibTeX

@inproceedings{wang2022acml-constrained,
  title     = {{Constrained Contrastive Reinforcement Learning}},
  author    = {Wang, Haoyu and Yang, Xinrui and Wang, Yuhang and Xuguang, Lan},
  booktitle = {Proceedings of The 14th Asian Conference on Machine Learning},
  year      = {2022},
  pages     = {1070-1084},
  volume    = {189},
  url       = {https://mlanthology.org/acml/2022/wang2022acml-constrained/}
}