Policy Gradient Methods with Adaptive Policy Spaces
Abstract
Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; then, based on the observations we receive from the unknown environment, we build a sequence of policy spaces of increasing complexity, which yield more sophisticated optimized policies at each epoch. The final result is a parametric policy whose structure (including the number of parameters) is fitted on the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the sequence of policies so obtained, and comparing the results with traditional policy optimization methods that employ a fixed policy space.
Cite
Text
Tedeschi et al. "Policy Gradient Methods with Adaptive Policy Spaces." ICML 2024 Workshops: ARLET, 2024.Markdown
[Tedeschi et al. "Policy Gradient Methods with Adaptive Policy Spaces." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/tedeschi2024icmlw-policy/)BibTeX
@inproceedings{tedeschi2024icmlw-policy,
title = {{Policy Gradient Methods with Adaptive Policy Spaces}},
author = {Tedeschi, Gianmarco and Papini, Matteo and Restelli, Marcello},
booktitle = {ICML 2024 Workshops: ARLET},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/tedeschi2024icmlw-policy/}
}