Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles
Abstract
Differentially private decision tree algorithms have been popular since the introduction of differential privacy. While many private tree-based algorithms have been proposed for supervised learning tasks, such as classification, very few extend naturally to the semi-supervised setting. In this paper, we present a framework that takes advantage of unlabelled data to reduce the noise requirement in differentially private decision forests and improves their predictive performance. The main ingredients in our approach consist of a median splitting criterion that creates balanced leaves, a geometric privacy budget allocation technique, and a random sampling technique to compute the private splitting-point accurately. While similar ideas existed in isolation, their combination is new, and has several advantages: (1) The semi-supervised mode of operation comes for free. (2) Our framework is applicable in two different privacy settings: when label-privacy is required, and when privacy of the features is also required. (3) Empirical evidence on 18 UCI data sets and 3 synthetic data sets demonstrate that our algorithm achieves high utility performance compared to the current state of the art in both supervised and semi-supervised classification problems.
Cite
Text
Huang et al. "Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022. doi:10.1007/978-3-031-26412-2_36Markdown
[Huang et al. "Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2022.](https://mlanthology.org/ecmlpkdd/2022/huang2022ecmlpkdd-noiseefficient/) doi:10.1007/978-3-031-26412-2_36BibTeX
@inproceedings{huang2022ecmlpkdd-noiseefficient,
title = {{Noise-Efficient Learning of Differentially Private Partitioning Machine Ensembles}},
author = {Huang, Zhanliang and Lei, Yunwen and Kabán, Ata},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2022},
pages = {587-603},
doi = {10.1007/978-3-031-26412-2_36},
url = {https://mlanthology.org/ecmlpkdd/2022/huang2022ecmlpkdd-noiseefficient/}
}