Methods for Automatic Machine-Learning Workflow Analysis

Abstract

Developing real-world Machine Learning-based Systems goes beyond algorithm development. ML algorithms are usually embedded in complex pre-processing steps and consider different stages like development, testing or deployment. Managing workflows poses several challenges, such as workflow versioning, sharing pipeline elements or optimizing individual workflow elements - tasks which are usually conducted manually by data scientists. A dataset containing 16 035 real-world Machine Learning and Data Science Workflows extracted from the ONE DATA platform ( https://onelogic.de/en/one-data/ ) is explored and made available. Based on our analysis, we develop a representation learning algorithm using a graph-level Graph Convolutional Network with explicit residuals which exploits workflow versioning history. Moreover, this method can easily be adapted to supervised tasks and outperforms state-of-the-art approaches in NAS-bench-101 performance prediction. Another interesting application is the suggestion of component types, for which a classification baseline is presented. A slightly adapted GCN using both graph- and node-level information further improves upon this baseline. The used codebase as well as all experimental setups with results are available at https://github.com/wendli01/workflow_analysis .

Cite

Text

Wendlinger et al. "Methods for Automatic Machine-Learning Workflow Analysis." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86517-7_4

Markdown

[Wendlinger et al. "Methods for Automatic Machine-Learning Workflow Analysis." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/wendlinger2021ecmlpkdd-methods/) doi:10.1007/978-3-030-86517-7_4

BibTeX

@inproceedings{wendlinger2021ecmlpkdd-methods,
  title     = {{Methods for Automatic Machine-Learning Workflow Analysis}},
  author    = {Wendlinger, Lorenz and Berndl, Emanuel and Granitzer, Michael},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {52-67},
  doi       = {10.1007/978-3-030-86517-7_4},
  url       = {https://mlanthology.org/ecmlpkdd/2021/wendlinger2021ecmlpkdd-methods/}
}