Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search

Abstract

Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling based on Hoeffding and empirical Bernstein races to the CMA-ES, a variable metric evolution strategy proposed for direct policy search. The uncertainty handling adjusts individually the number of episodes considered for the evaluation of a policy. The performance estimation is kept just accurate enough for a sufficiently good ranking of candidate policies, which is in turn sufficient for the CMA-ES to find better solutions. This increases the learning speed as well as the robustness of the algorithm.

Cite

Text

Heidrich-Meisner and Igel. "Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553426

Markdown

[Heidrich-Meisner and Igel. "Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/heidrichmeisner2009icml-hoeffding/) doi:10.1145/1553374.1553426

BibTeX

@inproceedings{heidrichmeisner2009icml-hoeffding,
  title     = {{Hoeffding and Bernstein Races for Selecting Policies in Evolutionary Direct Policy Search}},
  author    = {Heidrich-Meisner, Verena and Igel, Christian},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {401-408},
  doi       = {10.1145/1553374.1553426},
  url       = {https://mlanthology.org/icml/2009/heidrichmeisner2009icml-hoeffding/}
}