On Many-Actions Policy Gradient

Abstract

We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

Cite

Text

Nauman and Cygan. "On Many-Actions Policy Gradient." International Conference on Machine Learning, 2023.

Markdown

[Nauman and Cygan. "On Many-Actions Policy Gradient." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/nauman2023icml-manyactions/)

BibTeX

@inproceedings{nauman2023icml-manyactions,
  title     = {{On Many-Actions Policy Gradient}},
  author    = {Nauman, Michal and Cygan, Marek},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {25769-25789},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/nauman2023icml-manyactions/}
}