Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

Abstract

Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression \citep{simchi2020bypassing}, we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon $H$ (as known as CMDP with $H$ layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumption, i.e., a model class $\mathcal{M}$ containing the true underlying CMDP is provided in advance. We develop an efficient, statistically near-optimal algorithm requiring only $O(H \log T)$ calls to an offline density estimation algorithm (or oracle) across all $T$ rounds. This number can be further reduced to $O(H \log \log T)$ if $T$ is known in advance. Our results mark the first efficient and near-optimal reduction from CMDPs to offline density estimation without imposing any structural assumptions on the model class. A notable feature of our algorithm is the design of a layerwise exploration-exploitation tradeoff tailored to address the layerwise structure of CMDPs. Additionally, our algorithm is versatile and applicable to pure exploration tasks in reward-free reinforcement learning.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Qian et al. "Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff." Neural Information Processing Systems, 2024. doi:10.52202/079017-4250

Markdown

[Qian et al. "Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/qian2024neurips-offline/) doi:10.52202/079017-4250

BibTeX

@inproceedings{qian2024neurips-offline,
  title     = {{Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff}},
  author    = {Qian, Jian and Hu, Haichen and Simchi-Levi, David},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4250},
  url       = {https://mlanthology.org/neurips/2024/qian2024neurips-offline/}
}