Two-Sample Testing Using Deep Learning

Abstract

We propose a two-sample testing procedure based on learned deep neural network representations. To this end, we define two test statistics that perform an asymptotic location test on data samples mapped onto a hidden layer. The tests are consistent and asymptotically control the type-1 error rate. Their test statistics can be evaluated in linear time (in the sample size). Suitable data representations are obtained in a data-driven way, by solving a supervised or unsupervised transfer-learning task on an auxiliary (potentially distinct) data set. If no auxiliary data is available, we split the data into two chunks: one for learning representations and one for computing the test statistic. In experiments on audio samples, natural images and three-dimensional neuroimaging data our tests yield significant decreases in type-2 error rate (up to 35 percentage points) compared to state-of-the-art two-sample tests such as kernel-methods and classifier two-sample tests.

Cite

Text

Kirchler et al. "Two-Sample Testing Using Deep Learning." Artificial Intelligence and Statistics, 2020.

Markdown

[Kirchler et al. "Two-Sample Testing Using Deep Learning." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/kirchler2020aistats-twosample/)

BibTeX

@inproceedings{kirchler2020aistats-twosample,
  title     = {{Two-Sample Testing Using Deep Learning}},
  author    = {Kirchler, Matthias and Khorasani, Shahryar and Kloft, Marius and Lippert, Christoph},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {1387-1398},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/kirchler2020aistats-twosample/}
}