Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles

Abstract

Differentially private decision tree algorithms have been popular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of operation comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.

Cite

Text

Huang et al. "Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_36

Markdown

[Huang et al. "Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/huang2022ecmlpkdd-noiseefficient/) doi:10.1007/978-3-031-26412-2_36

BibTeX

@inproceedings{huang2022ecmlpkdd-noiseefficient,
  title     = {{Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles}},
  author    = {Huang, Zhanliang and Lei, Yunwen and Kabán, Ata},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2022},
  pages     = {587-603},
  doi       = {10.1007/978-3-031-26412-2_36},
  url       = {https://mlanthology.org/ecmlpkdd/2022/huang2022ecmlpkdd-noiseefficient/}
}