Applications of Optimal Transport Distances in Unsupervised AutoML

Abstract

In this work, we explore the utility of Optimal Transport-based dataset similarity to find similar \textit{unlabeled tabular} datasets, especially in the context of automated machine learning (AutoML) on unsupervised tasks. Since unsupervised tasks don't have a ground truth that optimization techniques can optimize towards, but often do have historical information on which pipelines work best, we propose to meta-learn over prior tasks to transfer useful pipelines to new tasks. Our intuition behind this work is that pipelines that worked well on datasets with a \textit{similar underlying data distribution} will work well on new datasets. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines on two downstream unsupervised tasks: Outlier Detection and Clustering. We obtain very promising results against existing baselines and state-of-the-art methods.

Cite

Text

Singh and Vanschoren. "Applications of Optimal Transport Distances in Unsupervised AutoML." NeurIPS 2023 Workshops: OTML, 2023.

Markdown

[Singh and Vanschoren. "Applications of Optimal Transport Distances in Unsupervised AutoML." NeurIPS 2023 Workshops: OTML, 2023.](https://mlanthology.org/neuripsw/2023/singh2023neuripsw-applications/)

BibTeX

@inproceedings{singh2023neuripsw-applications,
  title     = {{Applications of Optimal Transport Distances in Unsupervised AutoML}},
  author    = {Singh, Prabhant and Vanschoren, Joaquin},
  booktitle = {NeurIPS 2023 Workshops: OTML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/singh2023neuripsw-applications/}
}