Variational Inference with Tail-Adaptive F-Divergence

Abstract

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantee finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm (Haarnoja et al., 2018). Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

Cite

Text

Wang et al. "Variational Inference with Tail-Adaptive F-Divergence." Neural Information Processing Systems, 2018.

Markdown

[Wang et al. "Variational Inference with Tail-Adaptive F-Divergence." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/wang2018neurips-variational/)

BibTeX

@inproceedings{wang2018neurips-variational,
  title     = {{Variational Inference with Tail-Adaptive F-Divergence}},
  author    = {Wang, Dilin and Liu, Hao and Liu, Qiang},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {5737-5747},
  url       = {https://mlanthology.org/neurips/2018/wang2018neurips-variational/}
}