A Study of Causal Confusion in Preference-Based Reward Learning

Abstract

While there is much empirical and theoretical analysis of causal confusion and reward gaming behaviors in reinforcement learning and behavioral cloning approaches, we provide the first systematic study of causal confusion in the context of learning reward functions from preferences. We identify a set of three benchmark domains where we observe causal confusion when learning reward functions from offline datasets of pairwise trajectory preferences: a simple reacher domain, an assistive feeding domain, and an itch-scratching domain. To gain insight into this observed causal confusion, we perform a sensitivity analysis on the effect of different factors---the reward model capacity and feature dimensionality---on the robustness of rewards learned from preferences. We find evidence that learning rewards from preferences is highly sensitive and non-robust to spurious features and increasing model capacity. %, but not as sensitive to the type of training data. Videos, code, and supplemental results are available at https://sites.google.com/view/causal-reward-confusion.

Cite

Text

Tien et al. "A Study of Causal Confusion in Preference-Based Reward Learning." ICML 2022 Workshops: SCIS, 2022.

Markdown

[Tien et al. "A Study of Causal Confusion in Preference-Based Reward Learning." ICML 2022 Workshops: SCIS, 2022.](https://mlanthology.org/icmlw/2022/tien2022icmlw-study/)

BibTeX

@inproceedings{tien2022icmlw-study,
  title     = {{A Study of Causal Confusion in Preference-Based Reward Learning}},
  author    = {Tien, Jeremy and He, Jerry Zhi-Yang and Erickson, Zackory and Dragan, Anca and Brown, Daniel S.},
  booktitle = {ICML 2022 Workshops: SCIS},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/tien2022icmlw-study/}
}