Adaptive Estimation Q-Learning with Uncertainty and Familiarity

Abstract

One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.

Cite

Text

Gong et al. "Adaptive Estimation Q-Learning with Uncertainty and Familiarity." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/417

Markdown

[Gong et al. "Adaptive Estimation Q-Learning with Uncertainty and Familiarity." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/gong2023ijcai-adaptive/) doi:10.24963/IJCAI.2023/417

BibTeX

@inproceedings{gong2023ijcai-adaptive,
  title     = {{Adaptive Estimation Q-Learning with Uncertainty and Familiarity}},
  author    = {Gong, Xiaoyu and Lü, Shuai and Yu, Jiayu and Zhu, Sheng and Li, Zongze},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {3750-3758},
  doi       = {10.24963/IJCAI.2023/417},
  url       = {https://mlanthology.org/ijcai/2023/gong2023ijcai-adaptive/}
}