Exponential Lower Bounds for Planning in MDPs with Linearly-Realizable Optimal Action-Value Functions

Weisz, Gellért; Amortila, Philip; Szepesvári, Csaba

Exponential Lower Bounds for Planning in MDPs with Linearly-Realizable Optimal Action-Value Functions

Gellért Weisz, Philip Amortila, Csaba Szepesvári

ALT 2021 pp. 1237-1264

/alt/2021/weisz2021alt-exponential/

Abstract

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner. Previous work has left open the question of whether there exist sound planners that need only $\mbox{poly}(H,d)$ queries regardless of the MDP, where $H$ is the horizon and $d$ is the dimensionality of the features. We answer this question in the negative: we show that any sound planner must query at least $\min(e^{\Omega(d)},\Omega(2^H))$ samples in the fized-horizon setting and $e^{\Omega(d)}$ samples in the discounted setting. We also show that for any $\delta>0$, the least-squares value iteration algorithm with $\tilde{\mathcal{O}}(H^5 d^{H+1}/\delta^2)$ queries can compute a $\delta$-optimal policy in the fixed-horizon setting. We discuss implications and remaining open questions.

PDF ALT Semantic Scholar

Cite

Text

Weisz et al. "Exponential Lower Bounds for Planning in MDPs with Linearly-Realizable Optimal Action-Value Functions." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.

Markdown

[Weisz et al. "Exponential Lower Bounds for Planning in MDPs with Linearly-Realizable Optimal Action-Value Functions." Proceedings of the 32nd International Conference on Algorithmic Learning Theory, 2021.](https://mlanthology.org/alt/2021/weisz2021alt-exponential/)

BibTeX

@inproceedings{weisz2021alt-exponential,
  title     = {{Exponential Lower Bounds for Planning in MDPs with Linearly-Realizable Optimal Action-Value Functions}},
  author    = {Weisz, Gellért and Amortila, Philip and Szepesvári, Csaba},
  booktitle = {Proceedings of the 32nd International Conference on Algorithmic Learning Theory},
  year      = {2021},
  pages     = {1237-1264},
  volume    = {132},
  url       = {https://mlanthology.org/alt/2021/weisz2021alt-exponential/}
}