OpenEnsembles: A Python Resource for Ensemble Clustering

Abstract

In this paper we introduce OpenEnsembles, a Python toolkit for performing and analyzing ensemble clustering. Ensemble clustering is the process of creating many clustering solutions for a given dataset and utilizing the relationships observed across the ensemble to identify final solutions, which are more robust, stable or better than the individual solutions within the ensemble. The OpenEnsembles library provides a unified interface for applying transformations to data, clustering data, visualizing individual clustering solutions, visualizing and finishing the ensemble, and calculating validation metrics for a clustering solution for any given partitioning of the data. We have documented examples of using OpenEnsembles to create, analyze, and visualize a number of different types of ensemble approaches on toy and example datasets. OpenEnsembles is released under the GNU General Public License version 3, can be installed via Conda or the Python Package Index (pip), and is available at https://github.com/NaegleLab/OpenEnsembles.

Cite

Text

Ronan et al. "OpenEnsembles: A Python Resource for Ensemble Clustering." Machine Learning Open Source Software, 2018.

Markdown

[Ronan et al. "OpenEnsembles: A Python Resource for Ensemble Clustering." Machine Learning Open Source Software, 2018.](https://mlanthology.org/mloss/2018/ronan2018jmlr-openensembles/)

BibTeX

@article{ronan2018jmlr-openensembles,
  title     = {{OpenEnsembles: A Python Resource for Ensemble Clustering}},
  author    = {Ronan, Tom and Anastasio, Shawn and Qi, Zhijie and Tavares, Pedro Henrique S. Vieira and Sloutsky, Roman and Naegle, Kristen M.},
  journal   = {Machine Learning Open Source Software},
  year      = {2018},
  pages     = {1-6},
  volume    = {19},
  url       = {https://mlanthology.org/mloss/2018/ronan2018jmlr-openensembles/}
}