VA-Learning as a More Efficient Alternative to Q-Learning
Abstract
In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function. A natural question is: Why not learn the advantage function directly? In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning both in tabular implementations and deep RL agents on Atari-57 games. We also identify a close connection between VA-learning and the dueling architecture, which partially explains why a simple architectural change to DQN agents tends to improve performance.
Cite
Text
Tang et al. "VA-Learning as a More Efficient Alternative to Q-Learning." International Conference on Machine Learning, 2023.Markdown
[Tang et al. "VA-Learning as a More Efficient Alternative to Q-Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/tang2023icml-valearning/)BibTeX
@inproceedings{tang2023icml-valearning,
title = {{VA-Learning as a More Efficient Alternative to Q-Learning}},
author = {Tang, Yunhao and Munos, Remi and Rowland, Mark and Valko, Michal},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {33739-33757},
volume = {202},
url = {https://mlanthology.org/icml/2023/tang2023icml-valearning/}
}