Unsupervised Functional Dependency Discovery for Data Preparation
Abstract
We study the problem of functional dependency (FD) discovery to impose domain knowledge for downstream data preparation tasks. We introduce a framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can scale to large data instances with millions of tuples and hundreds of attributes, while recovering true FDs across a diverse array of synthetic datasets, even in the presence of noisy data. Overall, our methods show an average F1 improvement of 2× against state-of-the-art FD discovery methods. Our system also obtains better F1 in downstream data repairing task than manually defined FDs.
Cite
Text
Guo and Rekatsinas. "Unsupervised Functional Dependency Discovery for Data Preparation." ICLR 2019 Workshops: LLD, 2019.Markdown
[Guo and Rekatsinas. "Unsupervised Functional Dependency Discovery for Data Preparation." ICLR 2019 Workshops: LLD, 2019.](https://mlanthology.org/iclrw/2019/guo2019iclrw-unsupervised/)BibTeX
@inproceedings{guo2019iclrw-unsupervised,
title = {{Unsupervised Functional Dependency Discovery for Data Preparation}},
author = {Guo, Zhihan and Rekatsinas, Theodoros},
booktitle = {ICLR 2019 Workshops: LLD},
year = {2019},
url = {https://mlanthology.org/iclrw/2019/guo2019iclrw-unsupervised/}
}