Gaussian Sketching Yields a J-L Lemma in RKHS

Abstract

The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix $\bm K$ yields an operator whose counterpart in an RKHS $\cal H$, is a \emph{random projection} operator—in the spirit of Johnson-Lindenstrauss (J-L) lemma. To be precise, given a random matrix $Z$ with i.i.d. Gaussian entries, we show that a sketch $Z\bm{K}$ corresponds to a particular random operator in (infinite-dimensional) Hilbert space $\cal H$ that maps functions $f \in \cal H$ to a low-dimensional space $\bb R^d$, while preserving a weighted RKHS inner-product of the form $⟨f, g \rangle_{\Sigma} \doteq ⟨f, \Sigma^3 g \rangle_{\cal H}$, where $\Sigma$ is the \emph{covariance} operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel $k$-means (K-$k$-means), well-separated subsets of feature-space $\{K(\cdot, x): x \in \cal X\}$ remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-$k$-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset $\{X_i\}_{i=1}^N$ of size $N$, we can build the Gram matrix $\bm K$ on a much smaller subsample of size $n\ll N$, so that the sketch $Z\bm K$ is very cheap to obtain and subsequently apply as a projection operator on the original data $\{X_i\}_{i=1}^N$. We verify these insights empirically on synthetic data, and on real-world clustering applications.

Cite

Text

Kpotufe and Sriperumbudur. "Gaussian Sketching Yields a J-L Lemma in RKHS." Artificial Intelligence and Statistics, 2020.

Markdown

[Kpotufe and Sriperumbudur. "Gaussian Sketching Yields a J-L Lemma in RKHS." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/kpotufe2020aistats-gaussian/)

BibTeX

@inproceedings{kpotufe2020aistats-gaussian,
  title     = {{Gaussian Sketching Yields a J-L Lemma in RKHS}},
  author    = {Kpotufe, Samory and Sriperumbudur, Bharath},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {3928-3937},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/kpotufe2020aistats-gaussian/}
}