What Would the Expert $do(\cdot)$?: Causal Imitation Learning
Abstract
We develop algorithms for imitation learning from policy data that was corrupted by unobserved confounders. Sources of such confounding include \textit{(a)} persistent perturbations to actions or \textit{(b)} the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the classical \textit{instrumental variable regression} (IVR) technique, enabling us to recover the causally correct underlying policy \textit{without} requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (\texttt{DoubIL}) that can utilize access to a simulator and one of a game-theoretic flavor (\texttt{ResiduIL}) that can be run entirely offline. We discuss, from the perspective of performance, the types of confounding under which it is better to use an IVR-based technique instead of behavioral cloning and vice versa. We find both of our algorithms compare favorably to behavioral cloning on a simulated rocket landing task.
Cite
Text
Swamy et al. "What Would the Expert $do(\cdot)$?: Causal Imitation Learning." NeurIPS 2021 Workshops: DeepRL, 2021.Markdown
[Swamy et al. "What Would the Expert $do(\cdot)$?: Causal Imitation Learning." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/swamy2021neuripsw-expert/)BibTeX
@inproceedings{swamy2021neuripsw-expert,
title = {{What Would the Expert $do(\cdot)$?: Causal Imitation Learning}},
author = {Swamy, Gokul and Choudhury, Sanjiban and Bagnell, Drew and Wu, Steven},
booktitle = {NeurIPS 2021 Workshops: DeepRL},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/swamy2021neuripsw-expert/}
}