A Dantzig Selector Approach to Temporal Difference Learning

Abstract

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, l1-regularization methods tend to perform feature selection by promoting sparsity, and thus, are well-suited for high-dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed-point problem, its integration with l1-regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks.

Cite

Text

Geist et al. "A Dantzig Selector Approach to Temporal Difference Learning." International Conference on Machine Learning, 2012.

Markdown

[Geist et al. "A Dantzig Selector Approach to Temporal Difference Learning." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/geist2012icml-dantzig/)

BibTeX

@inproceedings{geist2012icml-dantzig,
  title     = {{A Dantzig Selector Approach to Temporal Difference Learning}},
  author    = {Geist, Matthieu and Scherrer, Bruno and Lazaric, Alessandro and Ghavamzadeh, Mohammad},
  booktitle = {International Conference on Machine Learning},
  year      = {2012},
  url       = {https://mlanthology.org/icml/2012/geist2012icml-dantzig/}
}