From Weighted Classification to Policy Search
Abstract
This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a sequence of single-stage reinforcement learning subproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning problem into simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication of the proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem.
Cite
Text
Blatt and Hero. "From Weighted Classification to Policy Search." Neural Information Processing Systems, 2005.Markdown
[Blatt and Hero. "From Weighted Classification to Policy Search." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/blatt2005neurips-weighted/)BibTeX
@inproceedings{blatt2005neurips-weighted,
title = {{From Weighted Classification to Policy Search}},
author = {Blatt, Doron and Hero, Alfred O.},
booktitle = {Neural Information Processing Systems},
year = {2005},
pages = {139-146},
url = {https://mlanthology.org/neurips/2005/blatt2005neurips-weighted/}
}