Robust Multi-Objective Reinforcement Learning with Dynamic Preferences
Abstract
This paper considers multi-objective reinforcement learning (MORL) when preferences over the multiple tasks are not perfectly known. Indeed, it is often the case in practice that an agent is trying to achieve tasks that may have competing goals but does not exactly know how to trade them off. The goal of MORL is thus to learn optimal policies under a set of possible preferences leading to different trade-offs on the Pareto frontier. Here, we propose a new method by considering the dynamics of preferences over tasks. While this is a more realistic setup in many scenarios, more importantly, it helps us devise a simple and straightforward approach by considering a surrogate state space made up of both states and preferences, which leads to a joint exploration of states and preferences. Static (and possibly unknown) preferences can also be understood as a limiting case of our framework. In sum, this allows us to devise both deep Q-learning and actor-critic methods based on planning under a preference-dependent policy and learning the multi-dimensional value function under said policy. Finally, the performance and effectiveness of our method are demonstrated in experiments run on different domains.
Cite
Text
Buet-Golfouse and Pahwa. "Robust Multi-Objective Reinforcement Learning with Dynamic Preferences." Proceedings of The 14th Asian Conference on Machine Learning, 2022.Markdown
[Buet-Golfouse and Pahwa. "Robust Multi-Objective Reinforcement Learning with Dynamic Preferences." Proceedings of The 14th Asian Conference on Machine Learning, 2022.](https://mlanthology.org/acml/2022/buetgolfouse2022acml-robust/)BibTeX
@inproceedings{buetgolfouse2022acml-robust,
title = {{Robust Multi-Objective Reinforcement Learning with Dynamic Preferences}},
author = {Buet-Golfouse, Francois and Pahwa, Parth},
booktitle = {Proceedings of The 14th Asian Conference on Machine Learning},
year = {2022},
pages = {96-111},
volume = {189},
url = {https://mlanthology.org/acml/2022/buetgolfouse2022acml-robust/}
}