Publications

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

Dingyang Chen, University of South Carolina - Columbia
Qi Zhang, University of South Carolina - ColumbiaFollow
Thinh T. Doan

Document Type

Article

Abstract

We study the performance of policy gradient methods for the subclass of Markov games known as Markov potential games (MPGs), which extends the notion of normal-form potential games to the stateful setting and includes the important special case of the fully cooperative setting where the agents share an identical reward function. Our focus in this paper is to study the convergence of the policy gradient method for solving MPGs under softmax policy parameterization, both tabular and parameterized with general function approximators such as neural networks. We first show the asymptotic convergence of this method to a Nash equilibrium of MPGs for tabular softmax policies. Second, we derive the finite-time performance of the policy gradient in two settings: 1) using the log-barrier regularization, and 2) using the natural policy gradient under the best-response dynamics (NPG-BR). Finally, extending the notion of price of anarchy (POA) and smoothness in normal-form games, we introduce the POA for MPGs and provide a POA bound for NPG-BR. To our knowledge, this is the first POA bound for solving MPGs. To support our theoretical results, we empirically compare the convergence rates and POA of policy gradient variants for both tabular and neural softmax policies.

Digital Object Identifier (DOI)

https://doi.org/10.48550/arXiv.2206.07642

Publication Info

Preprint version 2022.

This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

APA Citation

Chen, D., Zhang, Q., & Doan, T. T. (2022). Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games (arXiv:2206.07642). arXiv. https://doi.org/10.48550/arXiv.2206.07642

Link to Record

COinS

Publications

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

Document Type

Abstract

Digital Object Identifier (DOI)

Publication Info

APA Citation

Search

Browse

Submissions

Links

Publications

Convergence and Price of Anarchy Guarantees of the Softmax Policy Gradient in Markov Potential Games

Author(s)

Document Type

Abstract

Digital Object Identifier (DOI)

Publication Info

APA Citation

Share

Search

Browse

Submissions

Links