Learning Proposals for Sequential Importance Samplers Using Reinforced Variational Inference

Abstract

The problem of inferring unobserved values in a partially observed trajectory from a stochastic process can be considered as a structured prediction problem. Traditionally inference is conducted using heuristic-based Monte Carlo methods. This work considers learning heuristics by leveraging a connection between policy optimization reinforcement learning and approximate inference. In particular, we learn proposal distributions used in importance samplers by casting it as a variational inference problem. We then rewrite the variational lower bound as a policy optimization problem similar to Weber et al. (2015) allowing us to transfer techniques from reinforcement learning. We apply this technique to a simple stochastic process as a proof-of-concept and show that while it is viable, it will require more engineering effort to scale inference for rare observations

Cite

Text

Ahmed et al. "Learning Proposals for Sequential Importance Samplers Using Reinforced Variational Inference." ICLR 2019 Workshops: drlStructPred, 2019.

Markdown

[Ahmed et al. "Learning Proposals for Sequential Importance Samplers Using Reinforced Variational Inference." ICLR 2019 Workshops: drlStructPred, 2019.](https://mlanthology.org/iclrw/2019/ahmed2019iclrw-learning/)

BibTeX

@inproceedings{ahmed2019iclrw-learning,
  title     = {{Learning Proposals for Sequential Importance Samplers Using Reinforced Variational Inference}},
  author    = {Ahmed, Zafarali and Karuvally, Arjun and Precup, Doina and Gravel, Simon},
  booktitle = {ICLR 2019 Workshops: drlStructPred},
  year      = {2019},
  url       = {https://mlanthology.org/iclrw/2019/ahmed2019iclrw-learning/}
}