Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

Abstract

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning. As the computation of exact solutions to Bayesian reinforcement-learning problems is intractable, much of the literature has focused on developing suitable approximation algorithms. In this work, before diving into algorithm design, we first define, under mild structural assumptions, a complexity measure for BAMDP planning. As efficient exploration in BAMDPs hinges upon the judicious acquisition of information, our complexity measure highlights the worst-case difficulty of gathering information and exhausting epistemic uncertainty. To illustrate its significance, we establish a computationally-intractable, exact planning algorithm that takes advantage of this measure to show more efficient planning. We then conclude by introducing a specific form of state abstraction with the potential to reduce BAMDP complexity and gives rise to a computationally-tractable, approximate planning algorithm.

Cite

Text

Arumugam and Singh. "Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction." Neural Information Processing Systems, 2022.

Markdown

[Arumugam and Singh. "Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/arumugam2022neurips-planning/)

BibTeX

@inproceedings{arumugam2022neurips-planning,
  title     = {{Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction}},
  author    = {Arumugam, Dilip and Singh, Satinder P.},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/arumugam2022neurips-planning/}
}