Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

Abstract

The ability to leverage shared behaviors between tasks is critical for sample efficient multi-task reinforcement learning (MTRL). Prior approaches based on parameter sharing or policy distillation share behaviors uniformly across tasks and states or focus on learning one optimal policy. Therefore, they are fundamentally limited when tasks have conflicting behaviors because no one optimal policy exists. Our key insight is that we can instead share exploratory behavior which can be helpful even when the optimal behaviors differ. Furthermore, as we learn each task, we can guide the exploration by sharing behaviors in a task and state dependent way. To this end, we propose a novel MTRL method, Q-switch Mixture of policies (QMP), that learns to selectively share exploratory behavior be- tween tasks by using a mixture of policies based on estimated discounted returns to gather training data. Experimental results in manipulation and locomotion tasks demonstrate that our method outperforms prior behavior sharing methods, high- lighting the importance of task and state dependent sharing. Videos are available at https://sites.google.com/view/qmp-mtrl.

Cite

Text

Zhang et al. "Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Zhang et al. "Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/zhang2022neuripsw-efficient/)

BibTeX

@inproceedings{zhang2022neuripsw-efficient,
  title     = {{Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing}},
  author    = {Zhang, Grace and Jain, Ayush and Hwang, Injune and Sun, Shao-Hua and Lim, Joseph J},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/zhang2022neuripsw-efficient/}
}