BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation

Abstract

An enormous effort is usually devoted to data wrangling, the tedious process of cleaning, transforming and combining data, such that it is ready for modelling, visualisation or aggregation. Data transformation and formatting is one common task in data wrangling, which is performed by humans in two steps: (1) they recognise the specific domain of data (dates, phones, addresses, etc.) and (2) they apply conversions that are specific to that domain. However, the mechanisms to manipulate one specific domain can be unique and highly different from other domains. In this paper we present BK-ADAPT , a system that uses inductive programming (IP) with a dynamic background knowledge (BK) generated by a machine learning meta-model that selects the domain and/or the primitives from several descriptive features of the data wrangling problem. To show the performance of our method, we have created a web-based tool that allows users to provide a set of inputs and one or more examples of outputs, in such a way that the rest of examples are automatically transformed by the tool.

Cite

Text

Ochando et al. "BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019. doi:10.1007/978-3-030-46133-1_45

Markdown

[Ochando et al. "BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2019.](https://mlanthology.org/ecmlpkdd/2019/ochando2019ecmlpkdd-bkadapt/) doi:10.1007/978-3-030-46133-1_45

BibTeX

@inproceedings{ochando2019ecmlpkdd-bkadapt,
  title     = {{BK-ADAPT: Dynamic Background Knowledge for Automating Data Transformation}},
  author    = {Ochando, Lidia Contreras and Ferri, César and Hernández-Orallo, José and Martínez-Plumed, Fernando and Ramírez-Quintana, María José and Katayama, Susumu},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2019},
  pages     = {755-759},
  doi       = {10.1007/978-3-030-46133-1_45},
  url       = {https://mlanthology.org/ecmlpkdd/2019/ochando2019ecmlpkdd-bkadapt/}
}