A Label Efficient Two-Sample Test
Abstract
Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed bimodal query is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at https://github.com/wayne0908/Label-Efficient-Two-Sample.
Cite
Text
Li et al. "A Label Efficient Two-Sample Test." Uncertainty in Artificial Intelligence, 2022.Markdown
[Li et al. "A Label Efficient Two-Sample Test." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/li2022uai-label/)BibTeX
@inproceedings{li2022uai-label,
title = {{A Label Efficient Two-Sample Test}},
author = {Li, Weizhi and Dasarathy, Gautam and Ramamurthy, Karthikeyan Natesan and Berisha, Visar},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2022},
pages = {1168-1177},
volume = {180},
url = {https://mlanthology.org/uai/2022/li2022uai-label/}
}