A Primal-Dual Perspective for Distributed TD-Learning
Abstract
The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.
Cite
Text
Lim and Lee. "A Primal-Dual Perspective for Distributed TD-Learning." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/634Markdown
[Lim and Lee. "A Primal-Dual Perspective for Distributed TD-Learning." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/lim2025ijcai-primal/) doi:10.24963/IJCAI.2025/634BibTeX
@inproceedings{lim2025ijcai-primal,
title = {{A Primal-Dual Perspective for Distributed TD-Learning}},
author = {Lim, Han-Dong and Lee, Donghwan},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {5698-5706},
doi = {10.24963/IJCAI.2025/634},
url = {https://mlanthology.org/ijcai/2025/lim2025ijcai-primal/}
}