Label Privacy Source Coding in Vertical Federated Learning
Abstract
We study label privacy protection in vertical federated learning (VFL). VFL enables an active party who possesses labeled data to improve model performance (utility) by collaborating with passive parties who have auxiliary features. Recently, there has been a growing concern for protecting label privacy against passive parties who may surreptitiously deduce private labels from the output of their bottom models. In contrast to existing defense methods that focus on training-phase perturbation, we propose a novel offline-phase cleansing approach to protect label privacy barely compromising utility. Specifically, we first formulate a Label Privacy Source Coding (LPSC) problem to remove the redundant label information in the active party’s features from labels, by assigning each sample a new weight and label (i.e., residual) for federated training. We theoretically demonstrate that LPSC 1) satisfies $\epsilon $ ϵ -mutual information privacy ( $\epsilon $ ϵ -MIP) and 2) can be reduced to gradient boosting’s objective thereby efficiently optimized. Therefore, we propose a gradient boosting-based LPSC method to protect label privacy. Moreover, given that LPSC only provides bounded privacy enhancement, we further introduce the two-phase LPSC+ framework, which enables a flexible privacy-utility trade-off by incorporating training-phase perturbation methods, such as adversarial training. Experimental results on four real-world datasets substantiate the efficacy of LPSC and the superiority of our LPSC+ framework.
Cite
Text
Gao et al. "Label Privacy Source Coding in Vertical Federated Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70341-6_19Markdown
[Gao et al. "Label Privacy Source Coding in Vertical Federated Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/gao2024ecmlpkdd-label/) doi:10.1007/978-3-031-70341-6_19BibTeX
@inproceedings{gao2024ecmlpkdd-label,
title = {{Label Privacy Source Coding in Vertical Federated Learning}},
author = {Gao, Dashan and Wan, Sheng and Gu, Hanlin and Fan, Lixin and Yao, Xin and Yang, Qiang},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2024},
pages = {313-331},
doi = {10.1007/978-3-031-70341-6_19},
url = {https://mlanthology.org/ecmlpkdd/2024/gao2024ecmlpkdd-label/}
}