Importance Sampling Techniques for Policy Optimization

Abstract

How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling. For both of them, we derive a high-probability bound, of independent interest, and then we show how to employ it to define a suitable surrogate objective function that can be used for both action-based and parameter-based settings. The resulting algorithms are finally evaluated on a set of continuous control tasks, using both linear and deep policies, and compared with modern policy optimization methods.

Cite

Text

Metelli et al. "Importance Sampling Techniques for Policy Optimization." Journal of Machine Learning Research, 2020.

Markdown

[Metelli et al. "Importance Sampling Techniques for Policy Optimization." Journal of Machine Learning Research, 2020.](https://mlanthology.org/jmlr/2020/metelli2020jmlr-importance/)

BibTeX

@article{metelli2020jmlr-importance,
  title     = {{Importance Sampling Techniques for Policy Optimization}},
  author    = {Metelli, Alberto Maria and Papini, Matteo and Montali, Nico and Restelli, Marcello},
  journal   = {Journal of Machine Learning Research},
  year      = {2020},
  pages     = {1-75},
  volume    = {21},
  url       = {https://mlanthology.org/jmlr/2020/metelli2020jmlr-importance/}
}