Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

Abstract

This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric $L$. This metric is shown to be the expected second derivative of the log-partition function (under the model distribution), or equivalently, the variance of the vector of partial derivatives of the energy function. We evaluate our method on the task of joint-training a 3-layer Deep Boltzmann Machine and show that MFNG does indeed have faster per-epoch convergence compared to Stochastic Maximum Likelihood with centering, though wall-clock performance is currently not competitive.

Cite

Text

Desjardins et al. "Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines." International Conference on Learning Representations, 2013.

Markdown

[Desjardins et al. "Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines." International Conference on Learning Representations, 2013.](https://mlanthology.org/iclr/2013/desjardins2013iclr-metric/)

BibTeX

@inproceedings{desjardins2013iclr-metric,
  title     = {{Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines}},
  author    = {Desjardins, Guillaume and Pascanu, Razvan and Courville, Aaron C. and Bengio, Yoshua},
  booktitle = {International Conference on Learning Representations},
  year      = {2013},
  url       = {https://mlanthology.org/iclr/2013/desjardins2013iclr-metric/}
}