Value Pursuit Iteration

NeurIPS 2012 pp. 1340-1348

/neurips/2012/farahmand2012neurips-value/

Abstract

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning and planning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function. We theoretically study VPI and provide a finite-sample error upper bound for it.

PDF NeurIPS Semantic Scholar

Cite

Text

Farahmand and Precup. "Value Pursuit Iteration." Neural Information Processing Systems, 2012.

Markdown

[Farahmand and Precup. "Value Pursuit Iteration." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/farahmand2012neurips-value/)

BibTeX

@inproceedings{farahmand2012neurips-value,
  title     = {{Value Pursuit Iteration}},
  author    = {Farahmand, Amir M. and Precup, Doina},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {1340-1348},
  url       = {https://mlanthology.org/neurips/2012/farahmand2012neurips-value/}
}