From PAC-Bayes Bounds to KL Regularization

Abstract

We show that convex KL-regularized objective functions are obtained from a PAC-Bayes risk bound when using convex loss functions for the stochastic Gibbs classifier that upper-bound the standard zero-one loss used for the weighted majority vote. By restricting ourselves to a class of posteriors, that we call quasi uniform, we propose a simple coordinate descent learning algorithm to minimize the proposed KL-regularized cost function. We show that standard ell_p-regularized objective functions currently used, such as ridge regression and ell_p-regularized boosting, are obtained from a relaxation of the KL divergence between the quasi uniform posterior and the uniform prior. We present numerical experiments where the proposed learning algorithm generally outperforms ridge regression and AdaBoost.

Cite

Text

Germain et al. "From PAC-Bayes Bounds to KL Regularization." Neural Information Processing Systems, 2009.

Markdown

[Germain et al. "From PAC-Bayes Bounds to KL Regularization." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/germain2009neurips-pacbayes/)

BibTeX

@inproceedings{germain2009neurips-pacbayes,
  title     = {{From PAC-Bayes Bounds to KL Regularization}},
  author    = {Germain, Pascal and Lacasse, Alexandre and Marchand, Mario and Shanian, Sara and Laviolette, François},
  booktitle = {Neural Information Processing Systems},
  year      = {2009},
  pages     = {603-610},
  url       = {https://mlanthology.org/neurips/2009/germain2009neurips-pacbayes/}
}