Random Token Fusion for Multi-View Medical Diagnosis

Abstract

In multi-view medical diagnosis, deep learning-based models often fuse information from different imaging perspectives to improve diagnostic performance. However, existing approaches are prone to overfitting and rely heavily on view-specific features, which can lead to trivial solutions. In this work, we introduce Random Token Fusion (RTF), a novel technique designed to enhance multi-view medical image analysis using vision transformers. By integrating randomness into the feature fusion process during training, RTF addresses the issue of overfitting and enhances the robustness and accuracy of diagnostic models without incurring any additional cost at inference. We validate our approach on standard mammography and chest X-ray benchmark datasets. Through extensive experiments, we demonstrate that RTF consistently improves the performance of existing fusion methods, paving the way for a new generation of multi-view medical foundation models.

Cite

Text

Guo et al. "Random Token Fusion for Multi-View Medical Diagnosis." NeurIPS 2024 Workshops: AIM-FM, 2024.

Markdown

[Guo et al. "Random Token Fusion for Multi-View Medical Diagnosis." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/guo2024neuripsw-random/)

BibTeX

@inproceedings{guo2024neuripsw-random,
  title     = {{Random Token Fusion for Multi-View Medical Diagnosis}},
  author    = {Guo, Jingyu and Matsoukas, Christos and Strand, Fredrik and Smith, Kevin},
  booktitle = {NeurIPS 2024 Workshops: AIM-FM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/guo2024neuripsw-random/}
}