A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning

Abstract

Learning a good representation is a crucial challenge for reinforcement learning (RL) agents. Self-predictive algorithms jointly learn a latent representation and dynamics model by bootstrapping from future latent representations (BYOL). Recent work has developed theoretical insights into these algorithms by studying a continuous-time ODE model in the case of a fixed policy (BYOL-$\Pi$); this assumption is at odds with practical implementations, which explicitly condition their predictions on future actions. In this work, we take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective (BYOL-AC) using the ODE framework. Interestingly, we uncover that BYOL-$\Pi$ and BYOL-AC are related through the lens of variance. We unify the study of these objectives through two complementary lenses; a model-based perspective, where each objective is related to low-rank approximation of certain dynamics, and a model-free perspective, which relates the objectives to modified value, Q-value, and Advantage functions. This mismatch with the true value functions leads to the empirical observation (in both linear and deep RL experiments) that BYOL-$\Pi$ and BYOL-AC are either very similar in performance across many tasks or task-dependent.

Cite

Text

Khetarpal et al. "A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Khetarpal et al. "A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/khetarpal2025aistats-unifying/)

BibTeX

@inproceedings{khetarpal2025aistats-unifying,
  title     = {{A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning}},
  author    = {Khetarpal, Khimya and Guo, Zhaohan Daniel and Pires, Bernardo Avila and Tang, Yunhao and Lyle, Clare and Rowland, Mark and Heess, Nicolas and Borsa, Diana L and Guez, Arthur and Dabney, Will},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {181-189},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/khetarpal2025aistats-unifying/}
}