E-Valuating Classifier Two-Sample Tests

Abstract

We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-C2ST. Our test combines ideas from existing work on split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime-valid sequential two-sample tests. This feature allows for more effective use of data in constructing test statistics. Through simulations and real data applications, we empirically demonstrate that E-C2ST achieves enhanced statistical power by partitioning datasets into multiple batches, beyond the conventional two-split (training and testing) approach of standard two-sample classifier tests. This strategy increases the power of the test, while keeping the type I error well below the desired significance level.

Cite

Text

Pandeva et al. "E-Valuating Classifier Two-Sample Tests." Transactions on Machine Learning Research, 2024.

Markdown

[Pandeva et al. "E-Valuating Classifier Two-Sample Tests." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/pandeva2024tmlr-evaluating/)

BibTeX

@article{pandeva2024tmlr-evaluating,
  title     = {{E-Valuating Classifier Two-Sample Tests}},
  author    = {Pandeva, Teodora and Bakker, Tim and Naesseth, Christian A. and Forré, Patrick},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/pandeva2024tmlr-evaluating/}
}