Causality in Goal Conditioned RL: Return to No Future?
Abstract
The main goal of goal-conditioned RL (GCRL) is to learn actions that maximize the conditional probability of achieving the desired goal from the current state. To improve sample efficiency, GCRL utilizes either 1) imitation learning with expert demonstrations or 2) supervised learning with self-imitation, denoted goal-conditioned RL with supervised learning (GCRL-SL). The GCRL-SL algorithms directly estimate the probability of actions ($A=a$) given the current state ($S=s$), and a future, observed goal ($G=g$) from batch data generated under a behavior policy. Subsequently, the optimal action maximizes an estimate of $P(A \mid S=s, G=g)$. One crucial insight missing from empirical and theoretical work on GCRL relates to the causal interpretation of the policy learned by GCRL algorithms. In this study, we begin exploring a crucial question for ensuring safe and robust decision-making: What causal biases arise in the GCRL training process and when can these causal biases lead to a poor policy? Our theoretical and empirical analysis demonstrates that GCRL algorithms can result in learning poor policies when the training data follows particular causal graphs. This issue is particularly problematic when implementing GCRL in environments with potential unmeasured confounding, as often encountered in healthcare and mobile health applications.
Cite
Text
Malenica and Murphy. "Causality in Goal Conditioned RL: Return to No Future?." NeurIPS 2023 Workshops: GCRL, 2023.Markdown
[Malenica and Murphy. "Causality in Goal Conditioned RL: Return to No Future?." NeurIPS 2023 Workshops: GCRL, 2023.](https://mlanthology.org/neuripsw/2023/malenica2023neuripsw-causality/)BibTeX
@inproceedings{malenica2023neuripsw-causality,
title = {{Causality in Goal Conditioned RL: Return to No Future?}},
author = {Malenica, Ivana and Murphy, Susan},
booktitle = {NeurIPS 2023 Workshops: GCRL},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/malenica2023neuripsw-causality/}
}