Diverse Conventions for Human-AI Collaboration

Abstract

Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce mixed-play, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users.

Cite

Text

Sarkar et al. "Diverse Conventions for Human-AI Collaboration." Neural Information Processing Systems, 2023.

Markdown

[Sarkar et al. "Diverse Conventions for Human-AI Collaboration." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/sarkar2023neurips-diverse/)

BibTeX

@inproceedings{sarkar2023neurips-diverse,
  title     = {{Diverse Conventions for Human-AI Collaboration}},
  author    = {Sarkar, Bidipta and Shih, Andy and Sadigh, Dorsa},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/sarkar2023neurips-diverse/}
}