Self-Supervised Vertical Federated Learning
Abstract
We consider a system where parties store vertically-partitioned data with a partially overlapping sample space, and a server stores labels on a subset of data samples. Supervised Vertical Federated Learning (VFL) algorithms are limited to training models using only overlapping labeled data, which can lead to poor model performance or bias. Self-supervised learning has been shown to be effective for training on unlabeled data, but the current methods do not generalize to the vertically-partitioned setting. We propose a novel extension of self-supervised learning to VFL (SS-VFL), where unlabeled data is used to train representation networks and labeled data is used to train a downstream prediction network. We present two SS-VFL algorithms: SS-VFL-I is a two-phase algorithm which requires only one round of communication, while SS-VFL-C adds communication rounds to improve model generalization. We show that both SS-VFL algorithms can achieve up to $2\times$ higher accuracy than supervised VFL when labeled data is scarce at a significantly reduced communication cost.
Cite
Text
Castiglia et al. "Self-Supervised Vertical Federated Learning." NeurIPS 2022 Workshops: Federated_Learning, 2022.Markdown
[Castiglia et al. "Self-Supervised Vertical Federated Learning." NeurIPS 2022 Workshops: Federated_Learning, 2022.](https://mlanthology.org/neuripsw/2022/castiglia2022neuripsw-selfsupervised/)BibTeX
@inproceedings{castiglia2022neuripsw-selfsupervised,
title = {{Self-Supervised Vertical Federated Learning}},
author = {Castiglia, Timothy and Wang, Shiqiang and Patterson, Stacy},
booktitle = {NeurIPS 2022 Workshops: Federated_Learning},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/castiglia2022neuripsw-selfsupervised/}
}