Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Theory

Abstract

Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks. At the core of their empirical successes is the learned feature representation, which embeds rich observations, e.g., images and texts, into the latent space that encodes semantic structures. Meanwhile, the evolution of such a feature representation is crucial to the convergence of temporal-difference and Q-learning.

Cite

Text

Zhang et al. "Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Theory." Neural Information Processing Systems, 2020.

Markdown

[Zhang et al. "Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Theory." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/zhang2020neurips-temporaldifference/)

BibTeX

@inproceedings{zhang2020neurips-temporaldifference,
  title     = {{Can Temporal-Difference and Q-Learning Learn Representation? a Mean-Field Theory}},
  author    = {Zhang, Yufeng and Cai, Qi and Yang, Zhuoran and Chen, Yongxin and Wang, Zhaoran},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/zhang2020neurips-temporaldifference/}
}