A Tighter Analysis of Randomised Policy Iteration

UAI 2019 pp. 519-529

/uai/2019/taraviya2019uai-tighter/

Abstract

Policy Iteration (PI) is a popular family of algorithms to compute an optimal policy for a givenMarkov Decision Problem (MDP). Starting from an arbitrary initial policy, PI repeatedly performs locally-improving switches until an optimal policy is found. The exact form of the switching rule gives rise to different variants of PI. Two decades ago, Mansour and Singh [1999] provided the first non-trivial “strong” upper bound on the number of iterations taken by “Howard’s PI” (HPI), a widelyused variant of PI (strong bounds depend only on the number of states and actions in the MDP). They also proposed a randomised variant (RPI) and showed an even tighter strong upper bound. Their bounds for HPI and RPI have not been improved subsequently.\\We revisit the algorithms and analysis of Mansour and Singh [1999]. We prove a novel result on the structure of the policy space for k-action MDPs, $k\geq 2$, which generalises a result known for k = 2. Also proposing a new counting argument, we obtain a strong bound of (O$(\sqrt{k \log k}))^n$ iterations for an algorithm akin to RPI, improving significantly upon Mansour and Singh’s original bound of roughly O($(k/2)^n$). Similar analysis of a randomised variant of HPI also yields a strong upper bound of (O($\sqrt{k \log k}))^n$ iterations, registering the first exponential improvement for HPI over the trivial bound of $k^n$. Our other contributions include a lower bound of $\Omega(n)$ iterations for RPI and an upper bound of $1.6001^n$ iterations for a randomised variant of “Batch-Switching PI” [Kalyanakrishnan et al., 2016a] on 2-action MDPs—the tightest strong upper bound shown yet for the PI family.

PDF UAI Semantic Scholar

Cite

Text

Taraviya and Kalyanakrishnan. "A Tighter Analysis of Randomised Policy Iteration." Uncertainty in Artificial Intelligence, 2019.

Markdown

[Taraviya and Kalyanakrishnan. "A Tighter Analysis of Randomised Policy Iteration." Uncertainty in Artificial Intelligence, 2019.](https://mlanthology.org/uai/2019/taraviya2019uai-tighter/)

BibTeX

@inproceedings{taraviya2019uai-tighter,
  title     = {{A Tighter Analysis of Randomised Policy Iteration}},
  author    = {Taraviya, Meet and Kalyanakrishnan, Shivaram},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2019},
  pages     = {519-529},
  volume    = {115},
  url       = {https://mlanthology.org/uai/2019/taraviya2019uai-tighter/}
}