V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL
Abstract
A major challenge of multiagent reinforcement learning (MARL) is \emph{the curse of multiagents}, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms---V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i\th$ player. This is in sharp contrast to the size of the joint action space which is $\prod_{i=1}^m A_i$. V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.
Cite
Text
Jin et al. "V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL." ICLR 2022 Workshops: GMS, 2022.Markdown
[Jin et al. "V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/jin2022iclrw-vlearning/)BibTeX
@inproceedings{jin2022iclrw-vlearning,
title = {{V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL}},
author = {Jin, Chi and Liu, Qinghua and Wang, Yuanhao and Yu, Tiancheng},
booktitle = {ICLR 2022 Workshops: GMS},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/jin2022iclrw-vlearning/}
}