Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation

Abstract

Synthetic data generation (SDG) has become a popular approach to release private datasets. In SDG, a generative model is fitted on the private real data, and samples drawn from the model are released as the protected synthetic data. While real-world datasets usually consist of multiple tables with potential \emph{many-to-many} relationships (i.e.~\emph{many-to-many datasets}), recent research in SDG mostly focuses on modeling tables \emph{independently} or only considers generating datasets with special cases of many-to-many relationships such as \emph{one-to-many}. In this paper, we first study challenges of building faithful generative models for many-to-many datasets, identifying limitations of existing methods. We then present a novel factorization for many-to-many generative models, which leads to a scalable generation framework by combining recent results from random graph theory and representation learning. Finally, we extend the framework to establish the notion of $(\epsilon,\delta)$-differential privacy. Through a real-world dataset, we demonstrate that our method can generate synthetic datasets while preserving information within and across tables better than its closest competitor.

Cite

Text

Xu et al. "Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation." International Conference on Learning Representations, 2023.

Markdown

[Xu et al. "Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/xu2023iclr-synthetic/)

BibTeX

@inproceedings{xu2023iclr-synthetic,
  title     = {{Synthetic Data Generation of Many-to-Many Datasets via Random Graph Generation}},
  author    = {Xu, Kai and Ganev, Georgi and Joubert, Emile and Davison, Rees and Van Acker, Olivier and Robinson, Luke},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/xu2023iclr-synthetic/}
}