Q-Learning with UCB Exploration Is Sample Efficient for Infinite-Horizon MDP

Cite

Text

Dong et al. "Q-Learning with UCB Exploration Is Sample Efficient for Infinite-Horizon MDP." International Conference on Learning Representations, 2020.

Markdown

[Dong et al. "Q-Learning with UCB Exploration Is Sample Efficient for Infinite-Horizon MDP." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/dong2020iclr-qlearning/)

BibTeX

@inproceedings{dong2020iclr-qlearning,
  title     = {{Q-Learning with UCB Exploration Is Sample Efficient for Infinite-Horizon MDP}},
  author    = {Dong, Kefan and Wang, Yuanhao and Chen, Xiaoyu and Wang, Liwei},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/dong2020iclr-qlearning/}
}