Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard

Tang, Jingwu; Swamy, Gokul; Fang, Fei; Wu, Steven

Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard

Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu

ICMLW 2024

/icmlw/2024/tang2024icmlw-multiagent-a/

Abstract

We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to *coordinate* a group of agents based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert *within* the support of the demonstrations. While doing so is sufficient to drive the *value gap* between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee robustness to deviations by strategic agents. Intuitively, this is because strategic deviations can depend on a counterfactual quantity: the coordinator's recommendations outside of the state distribution their recommendations induce. In response, we initiate the study of an alternative objective for MAIL in Markov Games we term the *regret gap* that explicitly accounts for potential deviations by agents in the group. We first perform an in-depth exploration of the relationship between the value and regret gaps. First, we show that while the value gap can be efficiently minimized via a direct extension of single-agent IL algorithms, even *value equivalence* can lead to an arbitrarily large regret gap. This implies that achieving regret equivalence is harder than achieving value equivalence in MAIL. We then provide a pair of efficient reductions to no-regret online convex optimization that are capable of minimizing the regret gap *(a)* under a coverage assumption on the expert (MALICE) or *(b)* with access to a queryable expert (BLADES).

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Tang et al. "Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Tang et al. "Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/tang2024icmlw-multiagent-a/)

BibTeX

@inproceedings{tang2024icmlw-multiagent-a,
  title     = {{Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard}},
  author    = {Tang, Jingwu and Swamy, Gokul and Fang, Fei and Wu, Steven},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/tang2024icmlw-multiagent-a/}
}