BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation
Abstract
An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present BK-ADAPT , a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.
Cite
Text
Ochando et al. "BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019. doi:10.1007/978-3-030-46133-1_45Markdown
[Ochando et al. "BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019.](https://mlanthology.org/ecmlpkdd/2019/ochando2019ecmlpkdd-bkadapt/) doi:10.1007/978-3-030-46133-1_45BibTeX
@inproceedings{ochando2019ecmlpkdd-bkadapt,
title = {{BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation}},
author = {Ochando, Lidia Contreras and Ferri, César and Hernández-Orallo, José and Martínez-Plumed, Fernando and Ramírez-Quintana, María José and Katayama, Susumu},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2019},
pages = {755-759},
doi = {10.1007/978-3-030-46133-1_45},
url = {https://mlanthology.org/ecmlpkdd/2019/ochando2019ecmlpkdd-bkadapt/}
}