Dual Approximation Policy Optimization
Abstract
We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the $L_2$-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.
Cite
Text
Xiong et al. "Dual Approximation Policy Optimization." ICML 2024 Workshops: ARLET, 2024.Markdown
[Xiong et al. "Dual Approximation Policy Optimization." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/xiong2024icmlw-dual/)BibTeX
@inproceedings{xiong2024icmlw-dual,
title = {{Dual Approximation Policy Optimization}},
author = {Xiong, Zhihan and Fazel, Maryam and Xiao, Lin},
booktitle = {ICML 2024 Workshops: ARLET},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/xiong2024icmlw-dual/}
}