A Kernel Method for the Two-Sample-Problem
Abstract
We propose two statistical tests to determine if two samples are from different dis- tributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be com- puted in O(m2) time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when compar- ing distributions over graphs, for which no alternative tests currently exist.
Cite
Text
Gretton et al. "A Kernel Method for the Two-Sample-Problem." Neural Information Processing Systems, 2006.Markdown
[Gretton et al. "A Kernel Method for the Two-Sample-Problem." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/gretton2006neurips-kernel/)BibTeX
@inproceedings{gretton2006neurips-kernel,
title = {{A Kernel Method for the Two-Sample-Problem}},
author = {Gretton, Arthur and Borgwardt, Karsten and Rasch, Malte and Schölkopf, Bernhard and Smola, Alex J.},
booktitle = {Neural Information Processing Systems},
year = {2006},
pages = {513-520},
url = {https://mlanthology.org/neurips/2006/gretton2006neurips-kernel/}
}