Search or Split: Policy Gradient with Adaptive Policy Space

Tedeschi, Gianmarco; Papini, Matteo; Metelli, Alberto Maria; Restelli, Marcello

doi:10.1007/S10994-025-06820-2

Search or Split: Policy Gradient with Adaptive Policy Space

Gianmarco Tedeschi, Matteo Papini, Alberto Maria Metelli, Marcello Restelli

MLJ 2025 pp. 186

doi:10.1007/S10994-025-06820-2 /mlj/2025/tedeschi2025mlj-search/

Abstract

Policy search is one of the most effective reinforcement learning classes of methods for solving continuous control tasks. These methodologies attempt to find a good policy for an agent by fixing a family of parametric policies and then searching directly for the parameters that optimize the long-term reward. However, this parametric policy space represents just a subset of all possible Markovian policies, and finding a good parametrization for a given task is a challenging problem in its own right, typically left to human expertise. In this paper, we propose a novel, model-free, adaptive-space policy search algorithm, GAPS (Gradient-based Adaptive Policy Search). We start from a simple policy space; once we have found a good policy within this policy space, based on the observations we receive from the unknown environment, we evaluate the possibility of expanding the policy space. Iterating this process, we obtain a parametric policy whose structure (including the number of parameters) is fitted to the problem at hand without any prior knowledge of the task. Finally, our algorithm is tested on a selection of continuous control tasks, evaluating the learning process with adaptive policy spaces and comparing the results with traditional policy optimization methods that employ a fixed policy space.

PDF MLJ Semantic Scholar

Cite

Text

Tedeschi et al. "Search or Split: Policy Gradient with Adaptive Policy Space." Machine Learning, 2025. doi:10.1007/S10994-025-06820-2

Markdown

[Tedeschi et al. "Search or Split: Policy Gradient with Adaptive Policy Space." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/tedeschi2025mlj-search/) doi:10.1007/S10994-025-06820-2

BibTeX

@article{tedeschi2025mlj-search,
  title     = {{Search or Split: Policy Gradient with Adaptive Policy Space}},
  author    = {Tedeschi, Gianmarco and Papini, Matteo and Metelli, Alberto Maria and Restelli, Marcello},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {186},
  doi       = {10.1007/S10994-025-06820-2},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/tedeschi2025mlj-search/}
}