Efficient Sample Reuse in EM-Based Policy Search

Abstract

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend an EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse (R^3), is demonstrated through a robot learning experiment.

Cite

Text

Hachiya et al. "Efficient Sample Reuse in EM-Based Policy Search." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2009. doi:10.1007/978-3-642-04180-8_48

Markdown

[Hachiya et al. "Efficient Sample Reuse in EM-Based Policy Search." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2009.](https://mlanthology.org/ecmlpkdd/2009/hachiya2009ecmlpkdd-efficient/) doi:10.1007/978-3-642-04180-8_48

BibTeX

@inproceedings{hachiya2009ecmlpkdd-efficient,
  title     = {{Efficient Sample Reuse in EM-Based Policy Search}},
  author    = {Hachiya, Hirotaka and Peters, Jan and Sugiyama, Masashi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2009},
  pages     = {469-484},
  doi       = {10.1007/978-3-642-04180-8_48},
  url       = {https://mlanthology.org/ecmlpkdd/2009/hachiya2009ecmlpkdd-efficient/}
}