V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL

Abstract

A major challenge of multiagent reinforcement learning (MARL) is \emph{the curse of multiagents}, where the size of the joint action space scales exponentially with the number of agents. This remains to be a bottleneck for designing efficient MARL algorithms even in a basic scenario with finitely many states and actions. This paper resolves this challenge for the model of episodic Markov games. We design a new class of fully decentralized algorithms---V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i\th$ player. This is in sharp contrast to the size of the joint action space which is $\prod_{i=1}^m A_i$. V-learning (in its basic form) is a new class of single-agent RL algorithms that convert any adversarial bandit algorithm with suitable regret guarantees into an RL algorithm. Similar to the classical Q-learning algorithm, it performs incremental updates to the value functions. Different from Q-learning, it only maintains the estimates of V-values instead of Q-values. This key difference allows V-learning to achieve the claimed guarantees in the MARL setting by simply letting all agents run V-learning independently.

Cite

Text

Jin et al. "V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL." ICLR 2022 Workshops: GMS, 2022.

Markdown

[Jin et al. "V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/jin2022iclrw-vlearning/)

BibTeX

@inproceedings{jin2022iclrw-vlearning,
  title     = {{V-Learning -- a Simple, Efficient, Decentralized Algorithm for Multiagent RL}},
  author    = {Jin, Chi and Liu, Qinghua and Wang, Yuanhao and Yu, Tiancheng},
  booktitle = {ICLR 2022 Workshops: GMS},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/jin2022iclrw-vlearning/}
}