Multi-Critic Actor Learning: Teaching RL Policies to Act with Style

Siddharth Mysore, George Cheng, Yunqi Zhao, Kate Saenko, Meng Wu

ICLR 2022

/iclr/2022/mysore2022iclr-multicritic/

Abstract

Using a single value function (critic) shared over multiple tasks in Actor-Critic multi-task reinforcement learning (MTRL) can result in negative interference between tasks, which can compromise learning performance. Multi-Critic Actor Learning (MultiCriticAL) proposes instead maintaining separate critics for each task being trained while training a single multi-task actor. Explicitly distinguishing between tasks also eliminates the need for critics to learn to do so and mitigates interference between task-value estimates. MultiCriticAL is tested in the context of multi-style learning, a special case of MTRL where agents are trained to behave with different distinct behavior styles, and yields up to 56% performance gains over the single-critic baselines and even successfully learns behavior styles in cases where single-critic approaches may simply fail to learn. In a simulated real-world use case, MultiCriticAL enables learning policies that smoothly transition between multiple fighting styles on an experimental build of EA’s UFC game.

PDF ICLR Semantic Scholar

Cite

Text

Mysore et al. "Multi-Critic Actor Learning: Teaching RL Policies to Act with Style." International Conference on Learning Representations, 2022.

Markdown

[Mysore et al. "Multi-Critic Actor Learning: Teaching RL Policies to Act with Style." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/mysore2022iclr-multicritic/)

BibTeX

@inproceedings{mysore2022iclr-multicritic,
  title     = {{Multi-Critic Actor Learning: Teaching RL Policies to Act with Style}},
  author    = {Mysore, Siddharth and Cheng, George and Zhao, Yunqi and Saenko, Kate and Wu, Meng},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/mysore2022iclr-multicritic/}
}