Are You Using Test Log-Likelihood Correctly?

Abstract

Test log-likelihood is commonly used to compare different models of the same data and different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on other distributional quantities like means; and (ii) that approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations.

Cite

Text

Deshpande et al. "Are You Using Test Log-Likelihood Correctly?." NeurIPS 2022 Workshops: ICBINB, 2022.

Markdown

[Deshpande et al. "Are You Using Test Log-Likelihood Correctly?." NeurIPS 2022 Workshops: ICBINB, 2022.](https://mlanthology.org/neuripsw/2022/deshpande2022neuripsw-you/)

BibTeX

@inproceedings{deshpande2022neuripsw-you,
  title     = {{Are You Using Test Log-Likelihood Correctly?}},
  author    = {Deshpande, Sameer and Ghosh, Soumya and Nguyen, Tin D. and Broderick, Tamara},
  booktitle = {NeurIPS 2022 Workshops: ICBINB},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/deshpande2022neuripsw-you/}
}