Boosting Long-Tail Data Classification with Sparse Prototypical Networks

Abstract

Clinical Decision Support Systems (CDSS) have become ubiquitous in healthcare facilities, leveraging the increasing presence of Electronic Health Records (EHR). Predicting clinical outcomes from clinical text, such as identifying diagnoses based on the admission state of patients, is among the core tasks that a CDSS must address. The state-of-the-art for this task has been set by transformer encoder models, recently superseded by encoders enhanced with a prototypical network. This task remains a significant challenge due to the substantial imbalance of the outcome labels, which is characterized by a long-tailed distribution where the majority of diagnoses are under-represented. Motivated by recent biologically inspired findings in deep learning, we propose S-Proto, a novel, efficient, and sparse prototypical layer. Our method achieves state-of-the-art performance in outcome diagnosis prediction, without compromising on the explainability characteristics of prototypical encoders. Quantitative results demonstrate that our approach is robust to the challenges presented by clinical notes, and transfers successfully to a second, unseen dataset. Qualitative evaluation with medical doctors shows that S-Proto is capable of disaggregating the representations of a disease that manifests differently in patient cohorts.

Cite

Text

Figueroa et al. "Boosting Long-Tail Data Classification with Sparse Prototypical Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70368-3_26

Markdown

[Figueroa et al. "Boosting Long-Tail Data Classification with Sparse Prototypical Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/figueroa2024ecmlpkdd-boosting/) doi:10.1007/978-3-031-70368-3_26

BibTeX

@inproceedings{figueroa2024ecmlpkdd-boosting,
  title     = {{Boosting Long-Tail Data Classification with Sparse Prototypical Networks}},
  author    = {Figueroa, Alexei and Papaioannou, Jens-Michalis and Fallon, Conor and Bekiaridou, Alexandra and Bressem, Keno K. and Zanos, Stavros and Gers, Felix A. and Nejdl, Wolfgang and Löser, Alexander},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {434-449},
  doi       = {10.1007/978-3-031-70368-3_26},
  url       = {https://mlanthology.org/ecmlpkdd/2024/figueroa2024ecmlpkdd-boosting/}
}