Self-Supervised Vertical Federated Learning

Abstract

We consider a system where parties store vertically-partitioned data with a partially overlapping sample space, and a server stores labels on a subset of data samples. Supervised Vertical Federated Learning (VFL) algorithms are limited to training models using only overlapping labeled data, which can lead to poor model performance or bias. Self-supervised learning has been shown to be effective for training on unlabeled data, but the current methods do not generalize to the vertically-partitioned setting. We propose a novel extension of self-supervised learning to VFL (SS-VFL), where unlabeled data is used to train representation networks and labeled data is used to train a downstream prediction network. We present two SS-VFL algorithms: SS-VFL-I is a two-phase algorithm which requires only one round of communication, while SS-VFL-C adds communication rounds to improve model generalization. We show that both SS-VFL algorithms can achieve up to $2\times$ higher accuracy than supervised VFL when labeled data is scarce at a significantly reduced communication cost.

Cite

Text

Castiglia et al. "Self-Supervised Vertical Federated Learning." NeurIPS 2022 Workshops: Federated_Learning, 2022.

Markdown

[Castiglia et al. "Self-Supervised Vertical Federated Learning." NeurIPS 2022 Workshops: Federated_Learning, 2022.](https://mlanthology.org/neuripsw/2022/castiglia2022neuripsw-selfsupervised/)

BibTeX

@inproceedings{castiglia2022neuripsw-selfsupervised,
  title     = {{Self-Supervised Vertical Federated Learning}},
  author    = {Castiglia, Timothy and Wang, Shiqiang and Patterson, Stacy},
  booktitle = {NeurIPS 2022 Workshops: Federated_Learning},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/castiglia2022neuripsw-selfsupervised/}
}