Lexicographic Multi-Objective Reinforcement Learning

Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate

IJCAI 2022 pp. 3430-3436

doi:10.24963/IJCAI.2022/476 /ijcai/2022/skalse2022ijcai-lexicographic/

Abstract

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, and demonstrate their applicability in practical settings. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.

PDF IJCAI Semantic Scholar

Cite

Text

Skalse et al. "Lexicographic Multi-Objective Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/476

Markdown

[Skalse et al. "Lexicographic Multi-Objective Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/skalse2022ijcai-lexicographic/) doi:10.24963/IJCAI.2022/476

BibTeX

@inproceedings{skalse2022ijcai-lexicographic,
  title     = {{Lexicographic Multi-Objective Reinforcement Learning}},
  author    = {Skalse, Joar and Hammond, Lewis and Griffin, Charlie and Abate, Alessandro},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {3430-3436},
  doi       = {10.24963/IJCAI.2022/476},
  url       = {https://mlanthology.org/ijcai/2022/skalse2022ijcai-lexicographic/}
}