4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs

Abstract

Given a relational database (RDB), how can we predict missing column values in some target table of interest? Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks. From a delivery standpoint, we operationalize the above four dimensions (4D) of exploration within a unified, scalable open-source toolbox called 4DBInfer; please see https://github.com/awslabs/multi-table-benchmark .

Cite

Text

Wang et al. "4DBInfer:  A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs." Neural Information Processing Systems, 2024. doi:10.52202/079017-0856

Markdown

[Wang et al. "4DBInfer:  A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wang2024neurips-4dbinfer/) doi:10.52202/079017-0856

BibTeX

@inproceedings{wang2024neurips-4dbinfer,
  title     = {{4DBInfer:  A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs}},
  author    = {Wang, Minjie and Gan, Quan and Wipf, David and Cai, Zhenkun and Li, Ning and Tang, Jianheng and Zhang, Yanlin and Zhang, Zizhao and Mao, Zunyao and Song, Yakun and Wang, Yanbo and Li, Jiahang and Zhang, Han and Yang, Guang and Qin, Xiao and Lei, Chuan and Zhang, Muhan and Zhang, Weinan and Faloutsos, Christos and Zhang, Zheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0856},
  url       = {https://mlanthology.org/neurips/2024/wang2024neurips-4dbinfer/}
}