Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

Abstract

We study a variance reduction technique for Monte Carlo estimation of functionals in Markov chains. The method is based on designing sequential control variates using successive approximations of the function of interest V. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample trajectories of the Markov chain. Here, we obtain a geometric variance reduction O(ρN) (with ρV-AV, where A is an approximation operator linear in the values. Thus, if V belongs to the right approximation space (i.e. AV=V), the variance decreases geometrically to zero.

Cite

Text

Munos. "Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation." Journal of Machine Learning Research, 2006.

Markdown

[Munos. "Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation." Journal of Machine Learning Research, 2006.](https://mlanthology.org/jmlr/2006/munos2006jmlr-geometric/)

BibTeX

@article{munos2006jmlr-geometric,
  title     = {{Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation}},
  author    = {Munos, Rémi},
  journal   = {Journal of Machine Learning Research},
  year      = {2006},
  pages     = {413-427},
  volume    = {7},
  url       = {https://mlanthology.org/jmlr/2006/munos2006jmlr-geometric/}
}