Generalization Bounds in the Presence of Outliers: A Median-of-Means Study
Abstract
In contrast to the empirical mean, the Median-of-Means (MoM) is an estimator of the mean $\theta$ of a square integrable r.v. Z, around which accurate nonasymptotic confidence bounds can be built, even when Z does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to design training procedures that are not sensitive to atypical observations. More recently, a new line of work is now trying to characterize and leverage MoM’s ability to deal with corrupted data. In this context, the present work proposes a general study of MoM’s concentration properties under the contamination regime, that provides a clear understanding on the impact of the outlier proportion and the number of blocks chosen. The analysis is extended to (multisample) U-statistics, i.e. averages over tuples of observations, that raise additional challenges due to the dependence induced. Finally, we show that the latter bounds can be used in a straightforward fashion to derive generalization guarantees for pairwise learning in a contaminated setting, and propose an algorithm to compute provably reliable decision functions.
Cite
Text
Laforgue et al. "Generalization Bounds in the Presence of Outliers: A Median-of-Means Study." International Conference on Machine Learning, 2021.Markdown
[Laforgue et al. "Generalization Bounds in the Presence of Outliers: A Median-of-Means Study." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/laforgue2021icml-generalization/)BibTeX
@inproceedings{laforgue2021icml-generalization,
title = {{Generalization Bounds in the Presence of Outliers: A Median-of-Means Study}},
author = {Laforgue, Pierre and Staerman, Guillaume and Clémençon, Stephan},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {5937-5947},
volume = {139},
url = {https://mlanthology.org/icml/2021/laforgue2021icml-generalization/}
}