Near-Optimal Smoothing of Structured Conditional Probability Matrices

Abstract

Utilizing the structure of a probabilistic model can significantly increase its learning speed. Motivated by several recent applications, in particular bigram models in language processing, we consider learning low-rank conditional probability matrices under expected KL-risk. This choice makes smoothing, that is the careful handling of low-probability elements, paramount. We derive an iterative algorithm that extends classical non-negative matrix factorization to naturally incorporate additive smoothing and prove that it converges to the stationary points of a penalized empirical risk. We then derive sample-complexity bounds for the global minimizer of the penalized risk and show that it is within a small factor of the optimal sample complexity. This framework generalizes to more sophisticated smoothing techniques, including absolute-discounting.

Cite

Text

Falahatgar et al. "Near-Optimal Smoothing of Structured Conditional Probability Matrices." Neural Information Processing Systems, 2016.

Markdown

[Falahatgar et al. "Near-Optimal Smoothing of Structured Conditional Probability Matrices." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/falahatgar2016neurips-nearoptimal/)

BibTeX

@inproceedings{falahatgar2016neurips-nearoptimal,
  title     = {{Near-Optimal Smoothing of Structured Conditional Probability Matrices}},
  author    = {Falahatgar, Moein and Ohannessian, Mesrob I and Orlitsky, Alon},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {4860-4868},
  url       = {https://mlanthology.org/neurips/2016/falahatgar2016neurips-nearoptimal/}
}