Less Suboptimal Learning and Control in Variational POMDPs
Abstract
A recently uncovered pitfall in learning generative models with amortised variational inference, the conditioning gap, questions common practices in model-based reinforcement learning. Withholding a part of the quantities that the true posterior depends on from the inference network leads to a biased generative model and an approximate posterior that underestimates uncertainty. We examine the effect of the conditioning gap on model-based reinforcement learning with variational world models. We study the effect in three settings with known dynamics, which enables us to compare to a near-optimal policy. Our finding is that the impact of the conditioning gap becomes severe in systems where the state is hard to estimate.
Cite
Text
Anonymous. "Less Suboptimal Learning and Control in Variational POMDPs." ICLR 2021 Workshops: SSL-RL, 2021.Markdown
[Anonymous. "Less Suboptimal Learning and Control in Variational POMDPs." ICLR 2021 Workshops: SSL-RL, 2021.](https://mlanthology.org/iclrw/2021/anonymous2021iclrw-less/)BibTeX
@inproceedings{anonymous2021iclrw-less,
title = {{Less Suboptimal Learning and Control in Variational POMDPs}},
author = {Anonymous, },
booktitle = {ICLR 2021 Workshops: SSL-RL},
year = {2021},
url = {https://mlanthology.org/iclrw/2021/anonymous2021iclrw-less/}
}