Fair and Privacy-Preserving Synthetic Data Generation via Clustering-Based Variational Autoencoder and Adversarially Debiased Wasserstein Generative Adversarial Networks with Gradient Penalty

Adouani, Malek; Dagdia, Zaineb Chelly

doi:10.1007/978-3-032-05962-8_12

Fair and Privacy-Preserving Synthetic Data Generation via Clustering-Based Variational Autoencoder and Adversarially Debiased Wasserstein Generative Adversarial Networks with Gradient Penalty

Malek Adouani, Zaineb Chelly Dagdia

ECML-PKDD 2025 pp. 195-212

doi:10.1007/978-3-032-05962-8_12 /ecmlpkdd/2025/adouani2025ecmlpkdd-fair/

Abstract

The increasing reliance on machine learning in sensitive domains, such as healthcare, has amplified concerns about bias and privacy in data-driven decision-making. While fairness-aware generative models aim to mitigate bias, they often depend on labeled data, limiting their applicability in unsupervised settings. Conversely, differentially private generative models ensure privacy but may still encode hidden biases. Existing methods fail to jointly optimize fairness and privacy without explicit supervision. To address this gap, we propose a hybrid generative framework that integrates clustering-based Variational Autoencoder (VAE) with Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP) to generate fair and privacy-preserving synthetic data. The VAE structures latent representations under zero-Concentrated Differential Privacy (zCDP) while incorporating K-Means clustering directly in the latent space. The clustering serves as a factor to influence the generative process into producing samples that resemble real data in unsupervised settings. These structured representations along with cluster labels then guide WGAN-GP’s generator toward sample generation and enhance adversarial debiasing through the Fairness Critic, which penalizes correlations between synthetic data and sensitive attributes to ensure fairness. By integrating clustering-based VAEs with WGAN-GP, our framework enforces fairness while maintaining strong privacy guarantees. Experimental results demonstrate that it outperforms existing generative models by effectively reducing bias, preserving privacy, and ensuring high data utility across multiple fairness and privacy metrics.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Adouani and Dagdia. "Fair and Privacy-Preserving Synthetic Data Generation via Clustering-Based Variational Autoencoder and Adversarially Debiased Wasserstein Generative Adversarial Networks with Gradient Penalty." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-05962-8_12

Markdown

[Adouani and Dagdia. "Fair and Privacy-Preserving Synthetic Data Generation via Clustering-Based Variational Autoencoder and Adversarially Debiased Wasserstein Generative Adversarial Networks with Gradient Penalty." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/adouani2025ecmlpkdd-fair/) doi:10.1007/978-3-032-05962-8_12

BibTeX

@inproceedings{adouani2025ecmlpkdd-fair,
  title     = {{Fair and Privacy-Preserving Synthetic Data Generation via Clustering-Based Variational Autoencoder and Adversarially Debiased Wasserstein Generative Adversarial Networks with Gradient Penalty}},
  author    = {Adouani, Malek and Dagdia, Zaineb Chelly},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {195-212},
  doi       = {10.1007/978-3-032-05962-8_12},
  url       = {https://mlanthology.org/ecmlpkdd/2025/adouani2025ecmlpkdd-fair/}
}