Belief-Dependent Macro-Action Discovery in POMDPs Using the Value of Information

Abstract

This work introduces macro-action discovery using value-of-information (VoI) for robust and efficient planning in partially observable Markov decision processes (POMDPs). POMDPs are a powerful framework for planning under uncertainty. Previous approaches have used high-level macro-actions within POMDP policies to reduce planning complexity. However, macro-action design is often heuristic and rarely comes with performance guarantees. Here, we present a method for extracting belief-dependent, variable-length macro-actions directly from a low-level POMDP model. We construct macro-actions by chaining sequences of open-loop actions together when the task-specific value of information (VoI) --- the change in expected task performance caused by observations in the current planning iteration --- is low. Importantly, we provide performance guarantees on the resulting VoI macro-action policies in the form of bounded regret relative to the optimal policy. In simulated tracking experiments, we achieve higher reward than both closed-loop and hand-coded macro-action baselines, selectively using VoI macro-actions to reduce planning complexity while maintaining near-optimal task performance.

Cite

Text

Flaspohler et al. "Belief-Dependent Macro-Action Discovery in POMDPs Using the Value of Information." Neural Information Processing Systems, 2020.

Markdown

[Flaspohler et al. "Belief-Dependent Macro-Action Discovery in POMDPs Using the Value of Information." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/flaspohler2020neurips-beliefdependent/)

BibTeX

@inproceedings{flaspohler2020neurips-beliefdependent,
  title     = {{Belief-Dependent Macro-Action Discovery in POMDPs Using the Value of Information}},
  author    = {Flaspohler, Genevieve and Roy, Nicholas A. and Iii, John W. Fisher},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/flaspohler2020neurips-beliefdependent/}
}