A High-Dimensional Convergence Theorem for U-Statistics with Applications to Kernel-Based Testing

Abstract

We prove a convergence theorem for U-statistics of degree two, where the data dimension $d$ is allowed to scale with sample size $n$. We find that the limiting distribution of a U-statistic undergoes a phase transition from the non-degenerate Gaussian limit to the degenerate limit, regardless of its degeneracy and depending only on a moment ratio. A surprising consequence is that a non-degenerate U-statistic in high dimensions can have a non-Gaussian limit with a larger variance and asymmetric distribution. Our bounds are valid for any finite $n$ and $d$, independent of individual eigenvalues of the underlying function, and dimension-independent under a mild assumption. As an application, we apply our theory to two popular kernel-based distribution tests, MMD and KSD, whose high-dimensional performance has been challenging to study. In a simple empirical setting, our results correctly predict how the test power at a fixed threshold scales with $d$ and the bandwidth.

Cite

Text

Huang et al. "A High-Dimensional Convergence Theorem for U-Statistics with Applications to Kernel-Based Testing." Conference on Learning Theory, 2023.

Markdown

[Huang et al. "A High-Dimensional Convergence Theorem for U-Statistics with Applications to Kernel-Based Testing." Conference on Learning Theory, 2023.](https://mlanthology.org/colt/2023/huang2023colt-highdimensional/)

BibTeX

@inproceedings{huang2023colt-highdimensional,
  title     = {{A High-Dimensional Convergence Theorem for U-Statistics with Applications to Kernel-Based Testing}},
  author    = {Huang, Kevin H. and Liu, Xing and Duncan, Andrew and Gandy, Axel},
  booktitle = {Conference on Learning Theory},
  year      = {2023},
  pages     = {3827-3918},
  volume    = {195},
  url       = {https://mlanthology.org/colt/2023/huang2023colt-highdimensional/}
}