BERTHop: An Effective Vision-and-Language Model for Chest X-Ray Disease Diagnosis
Abstract
Vision-and-language (V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the Openl dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art (SOTA) while it is trained on a 9x smaller dataset.
Cite
Text
Monajatipoor et al. "BERTHop: An Effective Vision-and-Language Model for Chest X-Ray Disease Diagnosis." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00372Markdown
[Monajatipoor et al. "BERTHop: An Effective Vision-and-Language Model for Chest X-Ray Disease Diagnosis." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/monajatipoor2021iccvw-berthop/) doi:10.1109/ICCVW54120.2021.00372BibTeX
@inproceedings{monajatipoor2021iccvw-berthop,
title = {{BERTHop: An Effective Vision-and-Language Model for Chest X-Ray Disease Diagnosis}},
author = {Monajatipoor, Masoud and Rouhsedaghat, Mozhdeh and Li, Liunian Harold and Chien, Aichi and Kuo, C.-C. Jay and Scalzo, Fabien and Chang, Kai-Wei},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2021},
pages = {3327-3336},
doi = {10.1109/ICCVW54120.2021.00372},
url = {https://mlanthology.org/iccvw/2021/monajatipoor2021iccvw-berthop/}
}