QMDP-Net: Deep Learning for Planning Under Partial Observability

Abstract

This paper introduces the QMDP-net, a neural network architecture for planning under partial observability. The QMDP-net combines the strengths of model-free learning and model-based planning. It is a recurrent policy network, but it represents a policy for a parameterized set of tasks by connecting a model with a planning algorithm that solves the model, thus embedding the solution structure of planning in a network learning architecture. The QMDP-net is fully differentiable and allows for end-to-end training. We train a QMDP-net on different tasks so that it can generalize to new ones in the parameterized task set and “transfer” to other similar tasks beyond the set. In preliminary experiments, QMDP-net showed strong performance on several robotic tasks in simulation. Interestingly, while QMDP-net encodes the QMDP algorithm, it sometimes outperforms the QMDP algorithm in the experiments, as a result of end-to-end learning.

Cite

Text

Karkus et al. "QMDP-Net: Deep Learning for Planning Under Partial Observability." Neural Information Processing Systems, 2017.

Markdown

[Karkus et al. "QMDP-Net: Deep Learning for Planning Under Partial Observability." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/karkus2017neurips-qmdpnet/)

BibTeX

@inproceedings{karkus2017neurips-qmdpnet,
  title     = {{QMDP-Net: Deep Learning for Planning Under Partial Observability}},
  author    = {Karkus, Peter and Hsu, David and Lee, Wee Sun},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {4694-4704},
  url       = {https://mlanthology.org/neurips/2017/karkus2017neurips-qmdpnet/}
}