Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation

Abstract

We study a sequential variance reduction technique for Monte Carlo estimation of functionals in Markov Chains. The method is based on designing sequential control variates us-ing successive approximations of the function of interest V. Regular Monte Carlo estimates have a variance of O(1=N), where N is the number of samples. Here, we obtain a geo-metric variance reduction O(N) (with < 1) up to a thresh-old that depends on the approximation error V AV, where A is an approximation operator linear in the values. Thus, if V belongs to the right approximation space (i.e. AV = V), the variance decreases geometrically to zero. An immediate application is value function estimation in Markov chains, which may be used for policy evaluation in policy iteration for Markov Decision Processes. Another important domain, for which variance reduction is highly needed, is gradient estimation, that is computing the sensitivity @V of the performance measure V with respect to some parameter of the transition probabilities. For ex-ample, in parametric optimization of the policy, an estimate of the policy gradient is required to perform a gradient opti-mization method. We show that, using two approximations, the value func-tion and the gradient, a geometric variance reduction is also achieved, up to a threshold that depends on the approximation errors of both of those representations.

Cite

Text

Munos. "Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation." AAAI Conference on Artificial Intelligence, 2005.

Markdown

[Munos. "Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/munos2005aaai-geometric/)

BibTeX

@inproceedings{munos2005aaai-geometric,
  title     = {{Geometric Variance Reduction in Markov Chains. Application to Value Function and Gradient Estimation}},
  author    = {Munos, Rémi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1012-1017},
  url       = {https://mlanthology.org/aaai/2005/munos2005aaai-geometric/}
}