Addressing Environment Non-Stationarity by Repeating Q-Learning Updates

Abstract

Q-learning (QL) is a popular reinforcement learning algorithm that is guaranteed to converge to optimal policies in Markov decision processes. However, QL exhibits an artifact: in expectation, the effective rate of updating the value of an action depends on the probability of choosing that action. In other words, there is a tight coupling between the learning dynamics and underlying execution policy. This coupling can cause performance degradation in noisy non-stationary environments.

Cite

Text

Abdallah and Kaisers. "Addressing Environment Non-Stationarity by Repeating Q-Learning Updates." Journal of Machine Learning Research, 2016.

Markdown

[Abdallah and Kaisers. "Addressing Environment Non-Stationarity by Repeating Q-Learning Updates." Journal of Machine Learning Research, 2016.](https://mlanthology.org/jmlr/2016/abdallah2016jmlr-addressing/)

BibTeX

@article{abdallah2016jmlr-addressing,
  title     = {{Addressing Environment Non-Stationarity by Repeating Q-Learning Updates}},
  author    = {Abdallah, Sherief and Kaisers, Michael},
  journal   = {Journal of Machine Learning Research},
  year      = {2016},
  pages     = {1-31},
  volume    = {17},
  url       = {https://mlanthology.org/jmlr/2016/abdallah2016jmlr-addressing/}
}