Efficient Multi-Objective Reinforcement Learning via Multiple-Gradient Descent with Iteratively Discovered Weight-Vector Sets

Abstract

Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple (yet often conflicting) objectives. A typical approach to obtain the optimal policies is to first construct a loss function based on the scalarization of individual objectives and then derive optimal policies that minimize the scalarized loss function. Albeit simple and efficient, the typical approach provides no insights/mechanisms on the optimization of multiple objectives due to the lack of ability to quantify the inter-objective relationship. To address the issue, we propose to develop a new efficient gradient-based multi-objective reinforcement learning approach that seeks to iteratively uncover the quantitative inter-objective relationship via finding a minimum-norm point in the convex hull of the set of multiple policy gradients when the impact of one objective on others is unknown a priori. In particular, we first propose a new PAOLS algorithm that integrates pruning and approximate optimistic linear support algorithm to efficiently discover the weight-vector sets of multiple gradients that quantify the inter-objective relationship. Then we construct an actor and a multi-objective critic that can co-learn the policy and the multi-objective vector value function. Finally, the weight discovery process and the policy and vector value function learning process can be iteratively executed to yield stable weight-vector sets and policies. To validate the effectiveness of the proposed approach, we present a quantitative evaluation of the approach based on three case studies.

Cite

Text

Cao and Zhan. "Efficient Multi-Objective Reinforcement Learning via Multiple-Gradient Descent with Iteratively Discovered Weight-Vector Sets." Journal of Artificial Intelligence Research, 2021. doi:10.1613/JAIR.1.12270

Markdown

[Cao and Zhan. "Efficient Multi-Objective Reinforcement Learning via Multiple-Gradient Descent with Iteratively Discovered Weight-Vector Sets." Journal of Artificial Intelligence Research, 2021.](https://mlanthology.org/jair/2021/cao2021jair-efficient/) doi:10.1613/JAIR.1.12270

BibTeX

@article{cao2021jair-efficient,
  title     = {{Efficient Multi-Objective Reinforcement Learning via Multiple-Gradient Descent with Iteratively Discovered Weight-Vector Sets}},
  author    = {Cao, Yongcan and Zhan, Huixin},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2021},
  pages     = {319-349},
  doi       = {10.1613/JAIR.1.12270},
  volume    = {70},
  url       = {https://mlanthology.org/jair/2021/cao2021jair-efficient/}
}