Differentially Private Source-Target Clustering

Abstract

We consider a new private variant of the Source-Target Clustering (STC) setting, which was introduced by de Mathelin et al. (2022). In STC, there is a target dataset that needs to be clustered by selecting centers, in addition to centers that are already provided in a separate source dataset. The goal is to select centers from the target, such that the target clustering cost given the additional source centers is minimized. We consider private STC, in which the source dataset is private and should only be used under the constraint of differential privacy. This is motivated by scenarios in which the existing centers are private, for instance because they represent individuals in a social network. We derive lower bounds for the private STC objective, illustrating the theoretical limitations on worst-case guarantees for this setting. We then present a differentially private algorithm with asymptotically advantageous results under a data-dependent analysis, in which the guarantee depends on properties of the dataset, as well as more practical variants. We demonstrate in experiments the reduction in clustering cost that is obtained by our practical algorithms compared to baseline approaches.

Cite

Text

Schnapp and Sabato. "Differentially Private Source-Target Clustering." Transactions on Machine Learning Research, 2025.

Markdown

[Schnapp and Sabato. "Differentially Private Source-Target Clustering." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/schnapp2025tmlr-differentially/)

BibTeX

@article{schnapp2025tmlr-differentially,
  title     = {{Differentially Private Source-Target Clustering}},
  author    = {Schnapp, Shachar and Sabato, Sivan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/schnapp2025tmlr-differentially/}
}