Goodhart's Law in Reinforcement Learning
Abstract
Implementing a reward function that perfectly captures a complex task in the real world is impractical. As a result, it is often appropriate to think of the reward function as a *proxy* for the true objective rather than as its definition. We study this phenomenon through the lens of *Goodhart’s law*, which predicts that increasing optimisation of an imperfect proxy beyond some critical point decreases performance on the true objective. First, we propose a way to *quantify* the magnitude of this effect and *show empirically* that optimising an imperfect proxy reward often leads to the behaviour predicted by Goodhart’s law for a wide range of environments and reward functions. We then provide a *geometric explanation* for why Goodhart's law occurs in Markov decision processes. We use these theoretical insights to propose an *optimal early stopping method* that provably avoids the aforementioned pitfall and derive theoretical *regret bounds* for this method. Moreover, we derive a training method that maximises worst-case reward, for the setting where there is uncertainty about the true reward function. Finally, we evaluate our early stopping method experimentally. Our results support a foundation for a theoretically-principled study of reinforcement learning under reward misspecification.
Cite
Text
Karwowski et al. "Goodhart's Law in Reinforcement Learning." International Conference on Learning Representations, 2024.Markdown
[Karwowski et al. "Goodhart's Law in Reinforcement Learning." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/karwowski2024iclr-goodhart/)BibTeX
@inproceedings{karwowski2024iclr-goodhart,
title = {{Goodhart's Law in Reinforcement Learning}},
author = {Karwowski, Jacek and Hayman, Oliver and Bai, Xingjian and Kiendlhofer, Klaus and Griffin, Charlie and Skalse, Joar Max Viktor},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/karwowski2024iclr-goodhart/}
}