Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard
Abstract
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to *coordinate* a group of agents based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert *within* the support of the demonstrations. While doing so is sufficient to drive the *value gap* between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee robustness to deviations by strategic agents. Intuitively, this is because strategic deviations can depend on a counterfactual quantity: the coordinator's recommendations outside of the state distribution their recommendations induce. In response, we initiate the study of an alternative objective for MAIL in Markov Games we term the *regret gap* that explicitly accounts for potential deviations by agents in the group. We first perform an in-depth exploration of the relationship between the value and regret gaps. First, we show that while the value gap can be efficiently minimized via a direct extension of single-agent IL algorithms, even *value equivalence* can lead to an arbitrarily large regret gap. This implies that achieving regret equivalence is harder than achieving value equivalence in MAIL. We then provide a pair of efficient reductions to no-regret online convex optimization that are capable of minimizing the regret gap *(a)* under a coverage assumption on the expert (MALICE) or *(b)* with access to a queryable expert (BLADES).
Cite
Text
Tang et al. "Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard." ICML 2024 Workshops: MFHAIA, 2024.Markdown
[Tang et al. "Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/tang2024icmlw-multiagent-a/)BibTeX
@inproceedings{tang2024icmlw-multiagent-a,
title = {{Multi-Agent Imitation Learning: Value Is Easy, Regret Is Hard}},
author = {Tang, Jingwu and Swamy, Gokul and Fang, Fei and Wu, Steven},
booktitle = {ICML 2024 Workshops: MFHAIA},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/tang2024icmlw-multiagent-a/}
}