A Kernel Method for the Two-Sample-Problem

Abstract

We propose two statistical tests to determine if two samples are from different dis- tributions. Our test statistic is in both cases the distance between the means of the two samples mapped into a reproducing kernel Hilbert space (RKHS). The first test is based on a large deviation bound for the test statistic, while the second is based on the asymptotic distribution of this statistic. The test statistic can be com- puted in O(m2) time. We apply our approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where our test performs strongly. We also demonstrate excellent performance when compar- ing distributions over graphs, for which no alternative tests currently exist.

Cite

Text

Gretton et al. "A Kernel Method for the Two-Sample-Problem." Neural Information Processing Systems, 2006.

Markdown

[Gretton et al. "A Kernel Method for the Two-Sample-Problem." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/gretton2006neurips-kernel/)

BibTeX

@inproceedings{gretton2006neurips-kernel,
  title     = {{A Kernel Method for the Two-Sample-Problem}},
  author    = {Gretton, Arthur and Borgwardt, Karsten and Rasch, Malte and Schölkopf, Bernhard and Smola, Alex J.},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {513-520},
  url       = {https://mlanthology.org/neurips/2006/gretton2006neurips-kernel/}
}