Adaptive Estimation Q-Learning with Uncertainty and Familiarity
Abstract
One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.
Cite
Text
Gong et al. "Adaptive Estimation Q-Learning with Uncertainty and Familiarity." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/417Markdown
[Gong et al. "Adaptive Estimation Q-Learning with Uncertainty and Familiarity." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/gong2023ijcai-adaptive/) doi:10.24963/IJCAI.2023/417BibTeX
@inproceedings{gong2023ijcai-adaptive,
title = {{Adaptive Estimation Q-Learning with Uncertainty and Familiarity}},
author = {Gong, Xiaoyu and Lü, Shuai and Yu, Jiayu and Zhu, Sheng and Li, Zongze},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {3750-3758},
doi = {10.24963/IJCAI.2023/417},
url = {https://mlanthology.org/ijcai/2023/gong2023ijcai-adaptive/}
}