Applications of Optimal Transport Distances in Unsupervised AutoML
Abstract
In this work, we explore the utility of Optimal Transport-based dataset similarity to find similar \textit{unlabeled tabular} datasets, especially in the context of automated machine learning (AutoML) on unsupervised tasks. Since unsupervised tasks don't have a ground truth that optimization techniques can optimize towards, but often do have historical information on which pipelines work best, we propose to meta-learn over prior tasks to transfer useful pipelines to new tasks. Our intuition behind this work is that pipelines that worked well on datasets with a \textit{similar underlying data distribution} will work well on new datasets. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines on two downstream unsupervised tasks: Outlier Detection and Clustering. We obtain very promising results against existing baselines and state-of-the-art methods.
Cite
Text
Singh and Vanschoren. "Applications of Optimal Transport Distances in Unsupervised AutoML." NeurIPS 2023 Workshops: OTML, 2023.Markdown
[Singh and Vanschoren. "Applications of Optimal Transport Distances in Unsupervised AutoML." NeurIPS 2023 Workshops: OTML, 2023.](https://mlanthology.org/neuripsw/2023/singh2023neuripsw-applications/)BibTeX
@inproceedings{singh2023neuripsw-applications,
title = {{Applications of Optimal Transport Distances in Unsupervised AutoML}},
author = {Singh, Prabhant and Vanschoren, Joaquin},
booktitle = {NeurIPS 2023 Workshops: OTML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/singh2023neuripsw-applications/}
}