Gap-Dependent Bounds for Federated $q$-Learning

Abstract

We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $\log T$-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $MSA$ from the $\log T$ term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when $M=1$, removing $SA$ from the $\log T$ term.

Cite

Text

Zhang et al. "Gap-Dependent Bounds for Federated $q$-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhang et al. "Gap-Dependent Bounds for Federated $q$-Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-gapdependent/)

BibTeX

@inproceedings{zhang2025icml-gapdependent,
  title     = {{Gap-Dependent Bounds for Federated $q$-Learning}},
  author    = {Zhang, Haochen and Zheng, Zhong and Xue, Lingzhou},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {77107-77151},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhang2025icml-gapdependent/}
}