VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

Abstract

Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.

Cite

Text

Wu et al. "VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks." International Conference on Learning Representations, 2024.

Markdown

[Wu et al. "VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/wu2024iclr-vertibench/)

BibTeX

@inproceedings{wu2024iclr-vertibench,
  title     = {{VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks}},
  author    = {Wu, Zhaomin and Hou, Junyi and He, Bingsheng},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/wu2024iclr-vertibench/}
}