Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors
Abstract
Gaussian Mixture Models (GMMs) have been recently proposed for approximating actors in actor-critic reinforcement learning algorithms. Such GMM-based actors are commonly optimized using stochastic policy gradients along with an entropy maximization objective. In contrast to previous work, we define and study deterministic policy gradients for optimizing GMM-based actors. Similar to stochastic gradient approaches, our proposed method, denoted $\textit{Gaussian Mixture Deterministic Policy Gradient}$ (Gamid-PG), encourages policy entropy maximization. To this end, we define the GMM entropy gradient using $\textit{Variational Approximation}$ of the $KL$-divergence between the GMM's constituting Gaussians. We compare Gamid-PG with common stochastic policy gradient methods on benchmark dense-reward MuJoCo tasks and sparse-reward Fetch tasks. We observe that Gamid-PG outperforms stochastic gradient-based methods in 3/6 MuJoCo tasks while performing similarly on the remaining 3 tasks. In the Fetch tasks, Gamid-PG outperforms single-actor deterministic gradient-based methods while performing worse than stochastic policy gradient methods. Consequently, we conclude that GMMs optimized using deterministic policy gradients (1) should be favorably considered over stochastic gradients in dense-reward continuous control tasks, and (2) improve upon single-actor deterministic gradients.
Cite
Text
Dey and Sharon. "Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors." Transactions on Machine Learning Research, 2024.Markdown
[Dey and Sharon. "Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/dey2024tmlr-comparing/)BibTeX
@article{dey2024tmlr-comparing,
title = {{Comparing Deterministic and Soft Policy Gradients for Optimizing Gaussian Mixture Actors}},
author = {Dey, Sheelabhadra and Sharon, Guni},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/dey2024tmlr-comparing/}
}