Minimal Random Code Learning with Mean-KL Parameterization

Abstract

This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}}$. To achieve the desired compression rate, $D_{\mathrm{KL}}[Q_{\mathbf{w}} \Vert P_{\mathbf{w}}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}}$. Instead, we parameterize $Q_{\mathbf{w}}$ by its mean and KL divergence from $P_{\mathbf{w}}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.

Cite

Text

Lin et al. "Minimal Random Code Learning with Mean-KL Parameterization." ICML 2023 Workshops: NCW, 2023.

Markdown

[Lin et al. "Minimal Random Code Learning with Mean-KL Parameterization." ICML 2023 Workshops: NCW, 2023.](https://mlanthology.org/icmlw/2023/lin2023icmlw-minimal/)

BibTeX

@inproceedings{lin2023icmlw-minimal,
  title     = {{Minimal Random Code Learning with Mean-KL Parameterization}},
  author    = {Lin, Jihao Andreas and Flamich, Gergely and Hernández-Lobato, José Miguel},
  booktitle = {ICML 2023 Workshops: NCW},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/lin2023icmlw-minimal/}
}