Hindsight Optimization for Hybrid State and Action MDPs
Abstract
Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs.
Cite
Text
Raghavan et al. "Hindsight Optimization for Hybrid State and Action MDPs." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.11056Markdown
[Raghavan et al. "Hindsight Optimization for Hybrid State and Action MDPs." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/raghavan2017aaai-hindsight/) doi:10.1609/AAAI.V31I1.11056BibTeX
@inproceedings{raghavan2017aaai-hindsight,
title = {{Hindsight Optimization for Hybrid State and Action MDPs}},
author = {Raghavan, Aswin and Sanner, Scott and Khardon, Roni and Tadepalli, Prasad and Fern, Alan},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2017},
pages = {3790-3796},
doi = {10.1609/AAAI.V31I1.11056},
url = {https://mlanthology.org/aaai/2017/raghavan2017aaai-hindsight/}
}