Federated Natural Policy Gradient and Actor Critic Methods for Multi-Task Reinforcement Learning

Abstract

Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology.We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods in the tabular setting under softmax parameterization, where gradient tracking is applied to estimate the global Q-function to mitigate the impact of imperfect information sharing. We establish non-asymptotic global convergence guarantees under exact policy evaluation, where the rates are nearly independent of the size of the state-action space and illuminate the impacts of network size and connectivity. To the best of our knowledge, this is the first time that global convergence is established for federated multi-task RL using policy optimization. We further go beyond the tabular setting by proposing a federated natural actor critic (NAC) method for multi-task RL with function approximation, and establish its finite-time sample complexity taking the errors of function approximation into account.

Cite

Text

Yang et al. "Federated Natural Policy Gradient and Actor Critic Methods for Multi-Task Reinforcement Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-3855

Markdown

[Yang et al. "Federated Natural Policy Gradient and Actor Critic Methods for Multi-Task Reinforcement Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/yang2024neurips-federated-a/) doi:10.52202/079017-3855

BibTeX

@inproceedings{yang2024neurips-federated-a,
  title     = {{Federated Natural Policy Gradient and Actor Critic Methods for Multi-Task Reinforcement Learning}},
  author    = {Yang, Tong and Cen, Shicong and Wei, Yuting and Chen, Yuxin and Chi, Yuejie},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3855},
  url       = {https://mlanthology.org/neurips/2024/yang2024neurips-federated-a/}
}