A Pseudo-Metric Between Probability Distributions Based on Depth-Trimmed Regions
Abstract
The design of a metric between probability distributions is a longstanding problem motivated by numerous applications in machine learning. Focusing on continuous probability distributions in the Euclidean space $\mathbb{R}^d$, we introduce a novel pseudo-metric between probability distributions by leveraging the extension of univariate quantiles to multivariate spaces. Data depth is a nonparametric statistical tool that measures the centrality of any element $x\in\mathbb{R}^d$ with respect to (w.r.t.) a probability distribution or a dataset. It is a natural median-oriented extension of the cumulative distribution function (cdf) to the multivariate case. Thus, its upper-level sets---the depth-trimmed regions---give rise to a definition of multivariate quantiles. The new pseudo-metric relies on the average of the Hausdorff distance between the depth-based quantile regions for each distribution. Its good behavior regarding major transformation groups, as well as its ability to factor out translations, are depicted. Robustness, an appealing feature of this pseudo-metric, is studied through the finite sample breakdown point. Moreover, we propose an efficient approximation method with linear time complexity w.r.t. the size of the dataset and its dimension. The quality of this approximation and the performance of the proposed approach are illustrated in numerical experiments.
Cite
Text
Staerman et al. "A Pseudo-Metric Between Probability Distributions Based on Depth-Trimmed Regions." Transactions on Machine Learning Research, 2024.Markdown
[Staerman et al. "A Pseudo-Metric Between Probability Distributions Based on Depth-Trimmed Regions." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/staerman2024tmlr-pseudometric/)BibTeX
@article{staerman2024tmlr-pseudometric,
title = {{A Pseudo-Metric Between Probability Distributions Based on Depth-Trimmed Regions}},
author = {Staerman, Guillaume and Mozharovskyi, Pavlo and Colombo, Pierre and Clémençon, Stephan and d'Alché-Buc, Florence},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/staerman2024tmlr-pseudometric/}
}