Identification of Blackwell Optimal Policies for Deterministic MDPs
Abstract
This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.
Cite
Text
Boone and Gaujal. "Identification of Blackwell Optimal Policies for Deterministic MDPs." Artificial Intelligence and Statistics, 2023.Markdown
[Boone and Gaujal. "Identification of Blackwell Optimal Policies for Deterministic MDPs." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/boone2023aistats-identification/)BibTeX
@inproceedings{boone2023aistats-identification,
title = {{Identification of Blackwell Optimal Policies for Deterministic MDPs}},
author = {Boone, Victor and Gaujal, Bruno},
booktitle = {Artificial Intelligence and Statistics},
year = {2023},
pages = {7392-7424},
volume = {206},
url = {https://mlanthology.org/aistats/2023/boone2023aistats-identification/}
}