Quality-Similar Diversity via Population Based Reinforcement Learning

Abstract

Diversity is a growing research topic in Reinforcement Learning (RL). Previous research on diversity has mainly focused on promoting diversity to encourage exploration and thereby improve quality (the cumulative reward), maximizing diversity subject to quality constraints, or jointly maximizing quality and diversity, known as the quality-diversity problem. In this work, we present the quality-similar diversity problem that features diversity among policies of similar qualities. In contrast to task-agnostic diversity, we focus on task-specific diversity defined by a set of user-specified Behavior Descriptors (BDs). A BD is a scalar function of a trajectory (e.g., the fire action rate for an Atari game), which delivers the type of diversity the user prefers. To derive the gradient of the user-specified diversity with respect to a policy, which is not trivially available, we introduce a set of BD estimators and connect it with the classical policy gradient theorem. Based on the diversity gradient, we develop a population-based RL algorithm to adaptively and efficiently optimize the population diversity at multiple quality levels throughout training. Extensive results on MuJoCo and Atari demonstrate that our algorithm significantly outperforms previous methods in terms of generating user-specified diverse policies across different quality levels.

Cite

Text

Wu et al. "Quality-Similar Diversity via Population Based Reinforcement Learning." International Conference on Learning Representations, 2023.

Markdown

[Wu et al. "Quality-Similar Diversity via Population Based Reinforcement Learning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/wu2023iclr-qualitysimilar/)

BibTeX

@inproceedings{wu2023iclr-qualitysimilar,
  title     = {{Quality-Similar Diversity via Population Based Reinforcement Learning}},
  author    = {Wu, Shuang and Yao, Jian and Fu, Haobo and Tian, Ye and Qian, Chao and Yang, Yaodong and Fu, Qiang and Wei, Yang},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/wu2023iclr-qualitysimilar/}
}