Multi-Agent Off-Policy TDC with Near-Optimal Sample and Communication Complexities
Abstract
The finite-time convergence of off-policy temporal difference (TD) learning has been comprehensively studied recently. However, such a type of convergence has not been established for off-policy TD learning in the multi-agent setting, which covers broader reinforcement learning applications and is fundamentally more challenging. This work develops a decentralized TD with correction (TDC) algorithm for multi-agent off-policy TD learning under Markovian sampling. In particular, our algorithm avoids sharing the actions, policies and rewards of the agents, and adopts mini-batch sampling to reduce the sampling variance and communication frequency. Under Markovian sampling and linear function approximation, we proved that the finite-time sample complexity of our algorithm for achieving an $\epsilon$-accurate solution is in the order of $\mathcal{O}\big(\frac{M\ln\epsilon^{-1}}{\epsilon(1-\sigma_2)^2}\big)$, where $M$ denotes the total number of agents and $\sigma_2$ is a network parameter. This matches the sample complexity of the centralized TDC. Moreover, our algorithm achieves the optimal communication complexity $\mathcal{O}\big(\frac{\sqrt{M}\ln\epsilon^{-1}}{1-\sigma_2}\big)$ for synchronizing the value function parameters, which is order-wise lower than the communication complexity of the existing decentralized TD(0). Numerical simulations corroborate our theoretical findings.
Cite
Text
Chen et al. "Multi-Agent Off-Policy TDC with Near-Optimal Sample and Communication Complexities." Transactions on Machine Learning Research, 2022.Markdown
[Chen et al. "Multi-Agent Off-Policy TDC with Near-Optimal Sample and Communication Complexities." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/chen2022tmlr-multiagent/)BibTeX
@article{chen2022tmlr-multiagent,
title = {{Multi-Agent Off-Policy TDC with Near-Optimal Sample and Communication Complexities}},
author = {Chen, Ziyi and Zhou, Yi and Chen, Rong-Rong},
journal = {Transactions on Machine Learning Research},
year = {2022},
url = {https://mlanthology.org/tmlr/2022/chen2022tmlr-multiagent/}
}