Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles

Abstract

We present ARCH, a computational pathology (CP) multiple instance captioning dataset to facilitate dense supervision of CP tasks. Existing CP datasets focus on narrow tasks; ARCH on the other hand contains dense diagnostic and morphological descriptions for a range of stains, tissue types and pathologies. Using intrinsic dimensionality estimation, we show that ARCH is the only CP dataset to (ARCH-)rival its computer vision analog MS-COCO Captions. We conjecture that an encoder pre-trained on dense image captions learns transferable representations for most CP tasks. We support the conjecture with evidence that ARCH representation transfers to a variety of pathology sub-tasks better than ImageNet features or representations obtained via self-supervised or multi-task learning on pathology images alone. We release our best model and invite other researchers to test it on their CP tasks.

Cite

Text

Gamper and Rajpoot. "Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01628

Markdown

[Gamper and Rajpoot. "Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/gamper2021cvpr-multiple/) doi:10.1109/CVPR46437.2021.01628

BibTeX

@inproceedings{gamper2021cvpr-multiple,
  title     = {{Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles}},
  author    = {Gamper, Jevgenij and Rajpoot, Nasir},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {16549-16559},
  doi       = {10.1109/CVPR46437.2021.01628},
  url       = {https://mlanthology.org/cvpr/2021/gamper2021cvpr-multiple/}
}