Multi-Objective Evolution for Generalizable Policy Gradient Algorithms

Abstract

Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3% and 17%, respectively, and reduce instability up to 65%. In addition, we analyze the graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.

Cite

Text

Garau-Luis et al. "Multi-Objective Evolution for Generalizable Policy Gradient Algorithms." ICLR 2022 Workshops: GPL, 2022.

Markdown

[Garau-Luis et al. "Multi-Objective Evolution for Generalizable Policy Gradient Algorithms." ICLR 2022 Workshops: GPL, 2022.](https://mlanthology.org/iclrw/2022/garauluis2022iclrw-multiobjective/)

BibTeX

@inproceedings{garauluis2022iclrw-multiobjective,
  title     = {{Multi-Objective Evolution for Generalizable Policy Gradient Algorithms}},
  author    = {Garau-Luis, Juan Jose and Miao, Yingjie and Co-Reyes, John D and Parisi, Aaron and Tan, Jie and Real, Esteban and Faust, Aleksandra},
  booktitle = {ICLR 2022 Workshops: GPL},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/garauluis2022iclrw-multiobjective/}
}