Formalizing the Problem of Side Effect Regularization

Abstract

AI objectives are often hard to specify properly. Some approaches tackle this problem by regularizing the AI’s side effects: Agents must weigh off “how much of a mess they make” with an imperfectly specified proxy objective. We propose a formal criterion for side effect regularization via the assistance game framework [Shah et al., 2021]. In these games, the agent solves a partially observable Markov decision process (POMDP) representing its uncertainty about the objective function it should optimize. We consider the setting where the true objective is revealed to the agent at a later time step. We show that this POMDP is solved by trading off the proxy reward with the agent’s ability to achieve a range of future tasks. We empirically demonstrate the reasonableness of our problem formalization via ground-truth evaluation in two gridworld environments.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Turner et al. "Formalizing the Problem of Side Effect Regularization." NeurIPS 2022 Workshops: MLSW, 2022.

Markdown

[Turner et al. "Formalizing the Problem of Side Effect Regularization." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/turner2022neuripsw-formalizing/)

BibTeX

@inproceedings{turner2022neuripsw-formalizing,
  title     = {{Formalizing the Problem of Side Effect Regularization}},
  author    = {Turner, Alexander Matt and Saxena, Aseem and Tadepalli, Prasad},
  booktitle = {NeurIPS 2022 Workshops: MLSW},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/turner2022neuripsw-formalizing/}
}