Policy Gradient Play with Networked Agents in Markov Potential Games
Abstract
We introduce a distributed policy gradient play algorithm with networked agents playing Markov potential games. Agents have rewards at each stage of the game, that depend on the joint actions of agents given a common dynamic state. Agents implement parameterized and differentiable policies to take actions against each other. Markov potential assumes the existence of potential value functions. In a differentiable Markov potential game, partial gradients of a potential function are equal to the local gradients with respect to the individual parameters. In this work, agents receive information on other agents’ parameters via a communication network in addition to rewards. Agents then use stochastic gradients with respect to local estimates of joint policy parameters to update their policy parameters. We show that agents’ joint policy converges to a first-order stationary point of Markov potential value function with any type of function approximation, state and action spaces. Numerical experiments confirm the convergence result in the lake game, a Markov potential game.
Cite
Text
Aydin and Eksin. "Policy Gradient Play with Networked Agents in Markov Potential Games." Proceedings of The 5th Annual Learning for Dynamics and Control Conference, 2023.Markdown
[Aydin and Eksin. "Policy Gradient Play with Networked Agents in Markov Potential Games." Proceedings of The 5th Annual Learning for Dynamics and Control Conference, 2023.](https://mlanthology.org/l4dc/2023/aydin2023l4dc-policy/)BibTeX
@inproceedings{aydin2023l4dc-policy,
title = {{Policy Gradient Play with Networked Agents in Markov Potential Games}},
author = {Aydin, Sarper and Eksin, Ceyhun},
booktitle = {Proceedings of The 5th Annual Learning for Dynamics and Control Conference},
year = {2023},
pages = {184-195},
volume = {211},
url = {https://mlanthology.org/l4dc/2023/aydin2023l4dc-policy/}
}