Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions
Abstract
We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
Cite
Text
Li et al. "Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions." Neural Information Processing Systems, 2023.Markdown
[Li et al. "Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/li2023neurips-beyond/)BibTeX
@inproceedings{li2023neurips-beyond,
title = {{Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions}},
author = {Li, Tongxin and Lin, Yiheng and Ren, Shaolei and Wierman, Adam},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/li2023neurips-beyond/}
}