A Multi-Objective Approach to Mitigate Negative Side Effects
Abstract
Agents operating in unstructured environments often create negative side effects (NSE) that may not be easy to identify at design time. We examine how various forms of human feedback or autonomous exploration can be used to learn a penalty function associated with NSE during system deployment. We formulate the problem of mitigating the impact of NSE as a multi-objective Markov decision process with lexicographic reward preferences and slack. The slack denotes the maximum deviation from an optimal policy with respect to the agent's primary objective allowed in order to mitigate NSE as a secondary objective. Empirical evaluation of our approach shows that the proposed framework can successfully mitigate NSE and that different feedback mechanisms introduce different biases, which influence the identification of NSE.
Cite
Text
Saisubramanian et al. "A Multi-Objective Approach to Mitigate Negative Side Effects." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/50Markdown
[Saisubramanian et al. "A Multi-Objective Approach to Mitigate Negative Side Effects." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/saisubramanian2020ijcai-multi/) doi:10.24963/IJCAI.2020/50BibTeX
@inproceedings{saisubramanian2020ijcai-multi,
title = {{A Multi-Objective Approach to Mitigate Negative Side Effects}},
author = {Saisubramanian, Sandhya and Kamar, Ece and Zilberstein, Shlomo},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {354-361},
doi = {10.24963/IJCAI.2020/50},
url = {https://mlanthology.org/ijcai/2020/saisubramanian2020ijcai-multi/}
}