Laplace Sample Information: Data Informativeness Through a Bayesian Lens

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose $\text{\textit{Laplace Sample Information}}$ ($\mathsf{LSI}$) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. $\mathsf{LSI}$ leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that $\mathsf{LSI}$ is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of $\mathsf{LSI}$ on image and text data in supervised and unsupervised settings. Moreover, we show that $\mathsf{LSI}$ can be computed efficiently through probes and transfers well to the training of large models.

Cite

Text

Kaiser et al. "Laplace Sample Information:  Data Informativeness Through a Bayesian Lens." International Conference on Learning Representations, 2025.

Markdown

[Kaiser et al. "Laplace Sample Information:  Data Informativeness Through a Bayesian Lens." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/kaiser2025iclr-laplace/)

BibTeX

@inproceedings{kaiser2025iclr-laplace,
  title     = {{Laplace Sample Information:  Data Informativeness Through a Bayesian Lens}},
  author    = {Kaiser, Johannes and Schwethelm, Kristian and Rueckert, Daniel and Kaissis, Georgios},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/kaiser2025iclr-laplace/}
}