Using Confounded Data in Latent Model-Based Reinforcement Learning
Abstract
In the presence of confounding, naively using off-the-shelf offline reinforcement learning (RL) algorithms leads to sub-optimal behaviour. In this work, we propose a safe method to exploit confounded offline data in model-based RL, which improves the sample-efficiency of an interactive agent that also collects online, unconfounded data. First, we import ideas from the well-established framework of $do$-calculus to express model-based RL as a causal inference problem, thus bridging the gap between the fields of RL and causality. Then, we propose a generic method for learning a causal transition model from offline and online data, which captures and corrects the confounding effect using a hidden latent variable. We prove that our method is correct and efficient, in the sense that it attains better generalization guarantees thanks to the confounded offline data (in the asymptotic case), regardless of the confounding effect (the offline expert's behaviour). We showcase our method on a series of synthetic experiments, which demonstrate that a) using confounded offline data naively degrades the sample-efficiency of an RL agent; b) using confounded offline data correctly improves sample-efficiency.
Cite
Text
Gasse et al. "Using Confounded Data in Latent Model-Based Reinforcement Learning." Transactions on Machine Learning Research, 2023.Markdown
[Gasse et al. "Using Confounded Data in Latent Model-Based Reinforcement Learning." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/gasse2023tmlr-using/)BibTeX
@article{gasse2023tmlr-using,
title = {{Using Confounded Data in Latent Model-Based Reinforcement Learning}},
author = {Gasse, Maxime and Grasset, Damien and Gaudron, Guillaume and Oudeyer, Pierre-Yves},
journal = {Transactions on Machine Learning Research},
year = {2023},
url = {https://mlanthology.org/tmlr/2023/gasse2023tmlr-using/}
}