Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Abstract
Recent advancements in differentiable simulators highlight the potential of policy optimization using simulation gradients. Yet, these approaches are largely contingent on the continuity and smoothness of the simulation, which precludes the use of certain simulation engines, such as Mujoco. To tackle this challenge, we introduce the adaptive analytic gradient. This method views the Q function as a surrogate for future returns, consistent with the Bellman equation. By analyzing the variance of batched gradients, our method can autonomously opt for a more resilient Q function to compute the gradient when encountering rough simulation transitions. We also put forth the Adaptive-Gradient Policy Optimization (AGPO) algorithm, which leverages our proposed method for policy learning. On the theoretical side, we demonstrate AGPO’s convergence, emphasizing its stable performance under non-smooth dynamics due to low variance. On the empirical side, our results show that AGPO effectively mitigates the challenges posed by non-smoothness in policy learning through differentiable simulation.
Cite
Text
Gao et al. "Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations." International Conference on Machine Learning, 2024.Markdown
[Gao et al. "Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/gao2024icml-adaptivegradient/)BibTeX
@inproceedings{gao2024icml-adaptivegradient,
title = {{Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations}},
author = {Gao, Feng and Shi, Liangzhi and Zhang, Shenao and Wang, Zhaoran and Wu, Yi},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {14844-14858},
volume = {235},
url = {https://mlanthology.org/icml/2024/gao2024icml-adaptivegradient/}
}