E-Valuating Classifier Two-Sample Tests
Abstract
We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-C2ST. Our test combines ideas from existing work on split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime-valid sequential two-sample tests. This feature allows for more effective use of data in constructing test statistics. Through simulations and real data applications, we empirically demonstrate that E-C2ST achieves enhanced statistical power by partitioning datasets into multiple batches, beyond the conventional two-split (training and testing) approach of standard two-sample classifier tests. This strategy increases the power of the test, while keeping the type I error well below the desired significance level.
Cite
Text
Pandeva et al. "E-Valuating Classifier Two-Sample Tests." Transactions on Machine Learning Research, 2024.Markdown
[Pandeva et al. "E-Valuating Classifier Two-Sample Tests." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/pandeva2024tmlr-evaluating/)BibTeX
@article{pandeva2024tmlr-evaluating,
title = {{E-Valuating Classifier Two-Sample Tests}},
author = {Pandeva, Teodora and Bakker, Tim and Naesseth, Christian A. and Forré, Patrick},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/pandeva2024tmlr-evaluating/}
}