A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes

Abstract

In light of the Bellman duality, we propose a novel value-policy gradient algorithm to explore and act in infinite-horizon Average-reward Markov Decision Process (AMDP) and show that it has sublinear regret. The algorithm is motivated by the Bellman saddle point formulation. It learns the optimal state-action distribution, which encodes a randomized policy, by interacting with the environment along a single trajectory and making primal-dual updates. The key to the analysis is to establish a connection between the min-max duality gap of Bellman saddle point and the cumulative regret of the learning agent. We show that, for ergodic AMDPs with finite state space $\mathcal{S}$ and action space $\mathcal{A}$ and uniformly bounded mixing times, the algorithm’s $T$-time step regret is $ R(T)=\tilde{\mathcal{O}}\left( \left(t_{mix}^*\right)^2 \tau^{\frac{3}{2}} \sqrt{(\tau^3 + |\mathcal{A}|) |\mathcal{S}| T} \right), $ where $t_{mix}^*$ is the worst-case mixing time, $\tau$ is an ergodicity parameter, $T$ is the number of time steps and $\tilde{\mathcal{O}}$ hides polylog factors.

Cite

Text

Gong and Wang. "A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.

Markdown

[Gong and Wang. "A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.](https://mlanthology.org/l4dc/2020/gong2020l4dc-duality/)

BibTeX

@inproceedings{gong2020l4dc-duality,
  title     = {{A Duality Approach for Regret Minimization in Average-Award Ergodic Markov Decision Processes}},
  author    = {Gong, Hao and Wang, Mengdi},
  booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
  year      = {2020},
  pages     = {862-883},
  volume    = {120},
  url       = {https://mlanthology.org/l4dc/2020/gong2020l4dc-duality/}
}