Estimating Class Separability of Text Embeddings with Persistent Homology.

Gourgoulias, Kostis; Ghalyan, Najah; Labonne, Maxime; Satsangi, Yash; Moran, Sean; Sabelja, Joseph

Estimating Class Separability of Text Embeddings with Persistent Homology.

Kostis Gourgoulias, Najah Ghalyan, Maxime Labonne, Yash Satsangi, Sean Moran, Joseph Sabelja

TMLR 2024

/tmlr/2024/gourgoulias2024tmlr-estimating/

Abstract

This paper introduces an unsupervised method to estimate the class separability of text datasets from a topological point of view. Using persistent homology, we demonstrate how tracking the evolution of embedding manifolds during training can inform about class sep- arability. More specifically, we show how this technique can be applied to detect when the training process stops improving the separability of the embeddings. Our results, validated across binary and multi-class text classification tasks, show that the proposed method’s estimates of class separability align with those obtained from supervised methods. This approach offers a novel perspective on monitoring and improving the fine-tuning of sentence transformers for classification tasks, particularly in scenarios where labeled data is scarce. We also discuss how tracking these quantities can provide additional insights into the properties of the trained classifier.

PDF TMLR Semantic Scholar

Cite

Text

Gourgoulias et al. "Estimating Class Separability of Text Embeddings with Persistent Homology.." Transactions on Machine Learning Research, 2024.

Markdown

[Gourgoulias et al. "Estimating Class Separability of Text Embeddings with Persistent Homology.." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/gourgoulias2024tmlr-estimating/)

BibTeX

@article{gourgoulias2024tmlr-estimating,
  title     = {{Estimating Class Separability of Text Embeddings with Persistent Homology.}},
  author    = {Gourgoulias, Kostis and Ghalyan, Najah and Labonne, Maxime and Satsangi, Yash and Moran, Sean and Sabelja, Joseph},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/gourgoulias2024tmlr-estimating/}
}