Revisiting Conditional Functional Dependency Discovery: Splitting the "c" from the "FD"
Abstract
Many techniques for cleaning dirty data are based on enforcing some set of integrity constraints. Conditional functional dependencies (CFDs) are a combination of traditional Functional dependencies (FDs) and association rules, and are widely used as a constraint formalism for data cleaning. However, the discovery of such CFDs has received limited attention. In this paper, we regard CFDs as an extension of association rules, and present three general methodologies for (approximate) CFD discovery, each using a different way of combining pattern mining for discovering the conditions (the “C” in CFD) with FD discovery. We discuss how existing algorithms fit into these three methodologies, and introduce new techniques to improve the discovery process. We show that the right choice of methodology improves performance over the traditional CFD discovery method CTane. Code related to this paper is available at: https://github.com/j-r77/cfddiscovery , https://codeocean.com/2018/06/20/discovering-conditional-functional-dependencies/code .
Cite
Text
Rammelaere and Geerts. "Revisiting Conditional Functional Dependency Discovery: Splitting the "c" from the "FD"." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018. doi:10.1007/978-3-030-10928-8_33Markdown
[Rammelaere and Geerts. "Revisiting Conditional Functional Dependency Discovery: Splitting the "c" from the "FD"." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2018.](https://mlanthology.org/ecmlpkdd/2018/rammelaere2018ecmlpkdd-revisiting/) doi:10.1007/978-3-030-10928-8_33BibTeX
@inproceedings{rammelaere2018ecmlpkdd-revisiting,
title = {{Revisiting Conditional Functional Dependency Discovery: Splitting the "c" from the "FD"}},
author = {Rammelaere, Joeri and Geerts, Floris},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2018},
pages = {552-568},
doi = {10.1007/978-3-030-10928-8_33},
url = {https://mlanthology.org/ecmlpkdd/2018/rammelaere2018ecmlpkdd-revisiting/}
}