Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers
Abstract
Hierarchical order of play is an important concept for reinforcement learning to understand better the decisions made by strategic agents in a shared environment. In this paper, we compare the learning dynamics between Stackelberg and simultaneous reinforcement learning agents. Agents are trained using their policy gradient and are tested against each other in a tournament. We compare agent performance in zero-sum and non-zero-sum Markov games. We show that the Stackelberg leader performs better in training under the same parameters. However, under the same parameters in the tournament setting, Stackelberg leaders and followers performed similarly to the simultaneous player. Analytically, hierarchical training can potentially provide stronger guarantees for policy gradient.
Cite
Text
Vu et al. "Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers." ICLR 2022 Workshops: GMS, 2022.Markdown
[Vu et al. "Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/vu2022iclrw-stackelberg/)BibTeX
@inproceedings{vu2022iclrw-stackelberg,
title = {{Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers}},
author = {Vu, Quoc-Liem and Alumbaugh, Zane and Ching, Ryan and Ding, Quanchen and Mahajan, Arnav and Chasnov, Benjamin and Burden, Sam and Ratliff, Lillian J},
booktitle = {ICLR 2022 Workshops: GMS},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/vu2022iclrw-stackelberg/}
}