Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

Abstract

Whole Slide Image (WSI) classification has very significant applications in clinical pathology, e.g., tumor identification and cancer diagnosis. Currently, most research attention is focused on Multiple Instance Learning (MIL) using static datasets. One of the most obvious weaknesses of these methods is that they cannot efficiently preserve and utilize previously learned knowledge. With any new data arriving, classification models are required to be re-trained on both previous and current new data. To overcome this shortcoming and break through traditional vision modality, this paper proposes the first Vision-Language-based framework with Queryable Prototype Multiple Instance Learning (QPMIL-VL) specially designed for incremental WSI classification. This framework mainly consists of two information processing branches: one is for generating bag-level features by prototype-guided aggregation of instance features, while the other is for enhancing class features through a combination of class ensemble, tunable vector and class similarity loss. The experiments on four public WSI datasets demonstrate that our QPMIL-VL framework is effective for incremental WSI classification and often significantly outperforms other compared methods, achieving state-of-the-art (SOTA) performance.

Cite

Text

Gou et al. "Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32325

Markdown

[Gou et al. "Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/gou2025aaai-queryable/) doi:10.1609/AAAI.V39I3.32325

BibTeX

@inproceedings{gou2025aaai-queryable,
  title     = {{Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification}},
  author    = {Gou, Jiaxiang and Ji, Luping and Liu, Pei and Ye, Mao},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {3158-3166},
  doi       = {10.1609/AAAI.V39I3.32325},
  url       = {https://mlanthology.org/aaai/2025/gou2025aaai-queryable/}
}