Low-Rank Thinning

Abstract

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

Cite

Text

Carrell et al. "Low-Rank Thinning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Carrell et al. "Low-Rank Thinning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/carrell2025icml-lowrank/)

BibTeX

@inproceedings{carrell2025icml-lowrank,
  title     = {{Low-Rank Thinning}},
  author    = {Carrell, Annabelle Michael and Gong, Albert and Shetty, Abhishek and Dwivedi, Raaz and Mackey, Lester},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {6811-6848},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/carrell2025icml-lowrank/}
}