HaCore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning

Abstract

Vertical federated learning (VFL) trains model when the features of data samples are scattered over multiple clients. To improve efficiency, a promising approach is to find a coreset of the data samples and use it as a smaller training set. However, existing methods produce a large coreset when there are many clients and have long running time. To address these problems, we propose HaCore for efficient coreset construction in VFL setting. HaCore first employs locality sensitive hashing (LSH) to map features to bit signatures locally on the clients, and then merges the local signatures for k-medoids clustering. Data samples that correspond to the medoids are added to the coreset. The core idea is that the distance of original data samples can be approximated by the Hamming distance between their LSH-based bit signatures. To accelerate k-medoids, we utilize an inverted index to search the nearest medoid and a bit-counting method to quickly compute the aggregate distance from many signatures to a medoid. We evaluate HaCore on 5 datasets and compare with state-of-the-art coreset construction methods for VFL. The results show that HaCore accelerates the best-performing baseline by over 45x and matches the accuracy of training with all samples.

Cite

Text

Zhang et al. "HaCore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34409

Markdown

[Zhang et al. "HaCore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-hacore/) doi:10.1609/AAAI.V39I21.34409

BibTeX

@inproceedings{zhang2025aaai-hacore,
  title     = {{HaCore: Efficient Coreset Construction with Locality Sensitive Hashing for Vertical Federated Learning}},
  author    = {Zhang, Qinbo and Yan, Xiao and Ding, Yukai and Fu, Fangcheng and Xu, Quanqing and Li, Ziyi and Hu, Chuang and Jiang, Jiawei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22515-22523},
  doi       = {10.1609/AAAI.V39I21.34409},
  url       = {https://mlanthology.org/aaai/2025/zhang2025aaai-hacore/}
}