Adaptive Inference for Medical Vision Transformers: Token Reduction or Early Exit?

Byun, Ji Young; Lee, HyunSeo; Shuff, Jordan; Venkatesh, Rengaraj; Shekhawat, Nakul S.; Parikh, Kunal S.; Chellappa, Rama

Adaptive Inference for Medical Vision Transformers: Token Reduction or Early Exit?

Ji Young Byun, HyunSeo Lee, Jordan Shuff, Rengaraj Venkatesh, Nakul S. Shekhawat, Kunal S. Parikh, Rama Chellappa

MIDL 2026 pp. 2171-2191

/midl/2026/byun2026midl-adaptive/

Abstract

Vision Transformers (ViTs) have demonstrated exceptional performance in medical image analysis, yet their computational demands hinder clinical deployment, particularly in time-sensitive applications. Medical imaging requires sample-adaptive optimization due to dataset heterogeneity across modalities and sample complexity; uniform strategies do not well balance efficiency and accuracy. We propose a unified adaptive inference framework that combines Token Reduction (TR) and Early Exiting (EE) through dataset-specific profiling. Our approach quantifies spatial redundancy via Jensen-Shannon Divergence (JSD) and prediction confidence at intermediate layers to train a lightweight predictor that dynamically selects inference strategies at test time. Across five medical datasets, including a real-world cataract dataset (INSIGHT), our framework achieves 71.4% average floating-point operations (FLOPs) reduction with only 0.1pp accuracy loss, substantially outperforming individual strategies (EE-only: 55.9%, TR-only: 57.7%). On PathMNIST, our adaptive inference framework simultaneously improves accuracy by 1.3pp while reducing computation by 77.2%. On INSIGHT, we maintain baseline accuracy with 69.8% FLOPs reduction, demonstrating robust real-world clinical applicability.

PDF MIDL Semantic Scholar

Cite

Text

Byun et al. "Adaptive Inference for Medical Vision Transformers: Token Reduction or Early Exit?." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.

Markdown

[Byun et al. "Adaptive Inference for Medical Vision Transformers: Token Reduction or Early Exit?." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.](https://mlanthology.org/midl/2026/byun2026midl-adaptive/)

BibTeX

@inproceedings{byun2026midl-adaptive,
  title     = {{Adaptive Inference for Medical Vision Transformers: Token Reduction or Early Exit?}},
  author    = {Byun, Ji Young and Lee, HyunSeo and Shuff, Jordan and Venkatesh, Rengaraj and Shekhawat, Nakul S. and Parikh, Kunal S. and Chellappa, Rama},
  booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  year      = {2026},
  pages     = {2171-2191},
  volume    = {315},
  url       = {https://mlanthology.org/midl/2026/byun2026midl-adaptive/}
}