Diverse Conventions for Human-AI Collaboration
Abstract
Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce mixed-play, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.
Cite
Text
Sarkar et al. "Diverse Conventions for Human-AI Collaboration." Neural Information Processing Systems, 2023.Markdown
[Sarkar et al. "Diverse Conventions for Human-AI Collaboration." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/sarkar2023neurips-diverse/)BibTeX
@inproceedings{sarkar2023neurips-diverse,
title = {{Diverse Conventions for Human-AI Collaboration}},
author = {Sarkar, Bidipta and Shih, Andy and Sadigh, Dorsa},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/sarkar2023neurips-diverse/}
}