Goal-Conditioned Q-Learning as Knowledge Distillation

Levine, Alexander; Feizi, Soheil

doi:10.1609/AAAI.V37I7.26024

Goal-Conditioned Q-Learning as Knowledge Distillation

Alexander Levine, Soheil Feizi

AAAI 2023 pp. 8500-8509

doi:10.1609/AAAI.V37I7.26024 /aaai/2023/levine2023aaai-goal/

Abstract

Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms such as DDPG require at least O(d^2) replay buffer transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space. Code and appendix are available at https://github.com/alevine0/ReenGAGE.

PDF AAAI Semantic Scholar

Cite

Text

Levine and Feizi. "Goal-Conditioned Q-Learning as Knowledge Distillation." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I7.26024

Markdown

[Levine and Feizi. "Goal-Conditioned Q-Learning as Knowledge Distillation." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/levine2023aaai-goal/) doi:10.1609/AAAI.V37I7.26024

BibTeX

@inproceedings{levine2023aaai-goal,
  title     = {{Goal-Conditioned Q-Learning as Knowledge Distillation}},
  author    = {Levine, Alexander and Feizi, Soheil},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {8500-8509},
  doi       = {10.1609/AAAI.V37I7.26024},
  url       = {https://mlanthology.org/aaai/2023/levine2023aaai-goal/}
}