CPeSFA: Empowering SFs for Policy Learning and Transfer in Continuous Action Spaces

Abstract

Successor Features (SFs), together with Generalized Policy Improvement (GPI), comprise a conventional transfer Reinforcement Learning (RL) algorithm, which can transfer knowledge using the characteristic of decoupling policy with the task. However, SFs are value-based and cannot handle environments with continuous action spaces since GPI cannot transfer knowledge by traversing all possible actions in such a case. Recently, PeSFA decouples SFs from policies and further endows SFs with generalization capabilities in the policy space. However, it cannot be applied to continuous action spaces. In this paper, we introduce the Continuous PeSFA (CPeSFA) algorithm, an Actor-Critic (AC) architecture designed for learning and transferring policies in continuous action spaces. Our theoretical analysis shows that CPeSFA leverages SFs' generalization in the policy space to accelerate learning rates. Experimental results across Grid World, Reacher, and Point Maze environments demonstrate CPeSFA's superiority and effective knowledge transfer for rapid policy learning in new tasks.

Cite

Text

Li et al. "CPeSFA: Empowering SFs for Policy Learning and Transfer in Continuous Action Spaces." ICML 2024 Workshops: RLControlTheory, 2024.

Markdown

[Li et al. "CPeSFA: Empowering SFs for Policy Learning and Transfer in Continuous Action Spaces." ICML 2024 Workshops: RLControlTheory, 2024.](https://mlanthology.org/icmlw/2024/li2024icmlw-cpesfa/)

BibTeX

@inproceedings{li2024icmlw-cpesfa,
  title     = {{CPeSFA: Empowering SFs for Policy Learning and Transfer in Continuous Action Spaces}},
  author    = {Li, Yining and Yang, Tianpei and Guo, Wei and Hao, Jianye and Zheng, Yan},
  booktitle = {ICML 2024 Workshops: RLControlTheory},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/li2024icmlw-cpesfa/}
}