Stable Dual Dynamic Programming

Abstract

Recently, we have introduced a novel approach to dynamic programming and re- inforcement learning that is based on maintaining explicit representations of sta- tionary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.

Cite

Text

Wang et al. "Stable Dual Dynamic Programming." Neural Information Processing Systems, 2007.

Markdown

[Wang et al. "Stable Dual Dynamic Programming." Neural Information Processing Systems, 2007.](https://mlanthology.org/neurips/2007/wang2007neurips-stable/)

BibTeX

@inproceedings{wang2007neurips-stable,
  title     = {{Stable Dual Dynamic Programming}},
  author    = {Wang, Tao and Bowling, Michael and Schuurmans, Dale and Lizotte, Daniel J.},
  booktitle = {Neural Information Processing Systems},
  year      = {2007},
  pages     = {1569-1576},
  url       = {https://mlanthology.org/neurips/2007/wang2007neurips-stable/}
}