LOQA: Learning with Opponent Q-Learning Awareness
Abstract
In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA) , a novel reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes that each agent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint compared to previous works, making it a promising approach for practical multi-agent applications.
Cite
Text
Aghajohari et al. "LOQA: Learning with Opponent Q-Learning Awareness." International Conference on Learning Representations, 2024.Markdown
[Aghajohari et al. "LOQA: Learning with Opponent Q-Learning Awareness." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/aghajohari2024iclr-loqa/)BibTeX
@inproceedings{aghajohari2024iclr-loqa,
title = {{LOQA: Learning with Opponent Q-Learning Awareness}},
author = {Aghajohari, Milad and Duque, Juan Agustin and Cooijmans, Tim and Courville, Aaron},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/aghajohari2024iclr-loqa/}
}