Continuous Diffusion for Mixed-Type Tabular Data
Abstract
Score-based generative models or diffusion models have proven successful across many domains in generating texts and images. However, the consideration of mixed-type tabular data with this model family has fallen short so far. Existing research mainly combines continuous and categorical diffusion processes and does not explicitly account for the feature heterogeneity inherent to tabular data. In this paper, we combine score matching and score interpolation to ensure a common type of continuous noise distribution that affects both continuous and categorical features. Further, we investigate the impact of distinct noise schedules per feature or per data type. We allow for adaptive, learnable noise schedules to ensure optimally allocated model capacity and balanced generative capability. Results show that our model outperforms the benchmark models consistently and that accounting for heterogeneity within the noise schedule design boosts sample quality.
Cite
Text
Mueller et al. "Continuous Diffusion for Mixed-Type Tabular Data." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.Markdown
[Mueller et al. "Continuous Diffusion for Mixed-Type Tabular Data." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.](https://mlanthology.org/neuripsw/2023/mueller2023neuripsw-continuous/)BibTeX
@inproceedings{mueller2023neuripsw-continuous,
title = {{Continuous Diffusion for Mixed-Type Tabular Data}},
author = {Mueller, Markus and Gruber, Kathrin and Fok, Dennis},
booktitle = {NeurIPS 2023 Workshops: SyntheticData4ML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/mueller2023neuripsw-continuous/}
}