The Cross Entropy Method for Fast Policy Search

Abstract

We present a learning framework for Markovian decision processes that is based on optimization in the policy space. Instead of using relatively slow gradient-based optimization algorithms, we use the fast Cross Entropy method. The suggested framework is described for several reward criteria and its effectiveness is demonstrated for a grid world navigation task and for an inventory control problem. ICML Proceedings of the Twentieth International Conference on Machine Learning

Cite

Text

Mannor et al. "The Cross Entropy Method for Fast Policy Search." International Conference on Machine Learning, 2003.

Markdown

[Mannor et al. "The Cross Entropy Method for Fast Policy Search." International Conference on Machine Learning, 2003.](https://mlanthology.org/icml/2003/mannor2003icml-cross/)

BibTeX

@inproceedings{mannor2003icml-cross,
  title     = {{The Cross Entropy Method for Fast Policy Search}},
  author    = {Mannor, Shie and Rubinstein, Reuven Y. and Gat, Yohai},
  booktitle = {International Conference on Machine Learning},
  year      = {2003},
  pages     = {512-519},
  url       = {https://mlanthology.org/icml/2003/mannor2003icml-cross/}
}