Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

Abstract

The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.

Cite

Text

Bazinet et al. "Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Bazinet et al. "Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/bazinet2025aistats-sample/)

BibTeX

@inproceedings{bazinet2025aistats-sample,
  title     = {{Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses}},
  author    = {Bazinet, Mathieu and Zantedeschi, Valentina and Germain, Pascal},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {3286-3294},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/bazinet2025aistats-sample/}
}