Differentially Private Clustering in Data Streams

Abstract

Clustering problems (such as k-means and k-median) are fundamental unsupervised machine learning primitives. Recently, these problems have been subject to large interest in the privacy literature. All prior work on private clustering, however, has been devoted to the \emph{offline} case where the entire dataset is known in advance. In this work, we focus on the more challenging private data stream setting where the aim is to design memory-efficient algorithms that process a large stream \emph{incrementally} as points arrive in a private way. Our main contribution is to provide the first differentially private algorithms for $k$-means and $k$-median clustering in data streams. In particular, our algorithms are the first to guarantee differential privacy both in the continual release and in the one-shot setting while achieving space sublinear in the stream size. We complement our theoretical results with an empirical analysis of our algorithms on real data.

Cite

Text

Epasto et al. "Differentially Private Clustering in Data Streams." ICML 2023 Workshops: TAGML, 2023.

Markdown

[Epasto et al. "Differentially Private Clustering in Data Streams." ICML 2023 Workshops: TAGML, 2023.](https://mlanthology.org/icmlw/2023/epasto2023icmlw-differentially/)

BibTeX

@inproceedings{epasto2023icmlw-differentially,
  title     = {{Differentially Private Clustering in Data Streams}},
  author    = {Epasto, Alessandro and Mukherjee, Tamalika and Zhong, Peilin},
  booktitle = {ICML 2023 Workshops: TAGML},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/epasto2023icmlw-differentially/}
}