SemFORMS: Automatic Generation of Semantic Transforms by Mining Data Science Code

Abstract

Careful choice of feature transformations in a dataset can help predictive model performance, data understanding and data exploration. However, finding useful features is a challenge, and while recent Automated Machine Learning (AutoML) systems provide some limited automation for feature engineering or data exploration, it is still mostly done by humans. We demonstrate a system called SemFORMS (Semantic Transforms), which attempts to mine useful expressions for a dataset from access to a repository of code that may target the same dataset/similar dataset. In many enterprises, numerous data scientists often work on the same or similar datasets, but are largely unaware of each other's work. SemFORMS finds appropriate code from such a repository, and normalizes the code to be an actionable transform that can prepended into any AutoML pipeline. We demonstrate SemFORMS operating over example datasets from the OpenML benchmarks where it sometimes leads to significant improvements in AutoML performance.

Cite

Text

Abdelaziz et al. "SemFORMS: Automatic Generation of Semantic Transforms by Mining Data Science Code." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/827

Markdown

[Abdelaziz et al. "SemFORMS: Automatic Generation of Semantic Transforms by Mining Data Science Code." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/abdelaziz2023ijcai-semforms/) doi:10.24963/IJCAI.2023/827

BibTeX

@inproceedings{abdelaziz2023ijcai-semforms,
  title     = {{SemFORMS: Automatic Generation of Semantic Transforms by Mining Data Science Code}},
  author    = {Abdelaziz, Ibrahim and Dolby, Julian and Khurana, Udayan and Samulowitz, Horst and Srinivas, Kavitha},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {7106-7109},
  doi       = {10.24963/IJCAI.2023/827},
  url       = {https://mlanthology.org/ijcai/2023/abdelaziz2023ijcai-semforms/}
}