A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification

Abstract

Recently, computer vision has been significantly impacted by Vision Transformer (ViT) networks. These deep models have also succeeded in medical image classification. However, most existing deep learning-based methods primarily rely on a lot of labeled data to train reliable classifiers for accurate prediction. This requirement might be impractical in the medical field, where the data is limited and manual annotation is expensive. Therefore, this study explores the application of ViT in few-shot learning scenarios for medical image analysis, addressing the challenges posed by limited data availability. We evaluate various ViT models alongside few-shot learning algorithms (i.e., ProtoNet, MatchingNet, and Reptile), perform cross-domain experiments, and analyze the impact of data augmentation techniques. Our findings indicate that when combined with ProtoNets, ViT architectures outperform CNN-based counterparts and achieve competitive performance against state-of-the-art approaches on benchmark datasets. Cross-domain experiments further reveal the effectiveness of ViT models in few-shot medical image classification.

Cite

Text

Nurgazin and Tu. "A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00265

Markdown

[Nurgazin and Tu. "A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/nurgazin2023iccvw-comparative/) doi:10.1109/ICCVW60793.2023.00265

BibTeX

@inproceedings{nurgazin2023iccvw-comparative,
  title     = {{A Comparative Study of Vision Transformer Encoders and Few-Shot Learning for Medical Image Classification}},
  author    = {Nurgazin, Maxat and Tu, Nguyen Anh},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {2505-2513},
  doi       = {10.1109/ICCVW60793.2023.00265},
  url       = {https://mlanthology.org/iccvw/2023/nurgazin2023iccvw-comparative/}
}