Addressing Environment Non-Stationarity by Repeating Q-Learning Updates
Abstract
Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments.
Cite
Text
Abdallah and Kaisers. "Addressing Environment Non-Stationarity by Repeating Q-Learning Updates." Journal of Machine Learning Research, 2016.Markdown
[Abdallah and Kaisers. "Addressing Environment Non-Stationarity by Repeating Q-Learning Updates." Journal of Machine Learning Research, 2016.](https://mlanthology.org/jmlr/2016/abdallah2016jmlr-addressing/)BibTeX
@article{abdallah2016jmlr-addressing,
title = {{Addressing Environment Non-Stationarity by Repeating Q-Learning Updates}},
author = {Abdallah, Sherief and Kaisers, Michael},
journal = {Journal of Machine Learning Research},
year = {2016},
pages = {1-31},
volume = {17},
url = {https://mlanthology.org/jmlr/2016/abdallah2016jmlr-addressing/}
}