TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models

Abstract

Existing foundation models (FMs) in the medical domain often require extensive fine-tuning or rely on training resource-intensive decoders, while many existing encoders are pretrained with objectives biased toward specific tasks. This illustrates a need for a strong, task-agnostic foundation model that requires minimal fine-tuning beyond feature extraction. In this work, we introduce a suite of task-agnostic pretraining of CT foundation models (TAP-CT): a simple yet effective adaptation of Vision Transformers (ViTs) and DINOv2 for volumetric data, enabling scalable self-supervised pretraining directly on 3D CT volumes. Our approach incorporates targeted modifications to patch embeddings, positional encodings, and volumetric augmentations, making the architecture depth-aware while preserving the simplicity of the underlying architectures. We show that large-scale 3D pretraining on an extensive in-house CT dataset (105K volumes) yields stable, robust frozen representations that generalize strongly across downstream tasks. To promote transparency and reproducibility, and to establish a powerful, low-resource baseline for future research in medical imaging, we will release all pretrained models, experimental configurations, and downstream benchmark code at .

Cite

Text

Veenboer et al. "TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.

Markdown

[Veenboer et al. "TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.](https://mlanthology.org/midl/2026/veenboer2026midl-tapct/)

BibTeX

@inproceedings{veenboer2026midl-tapct,
  title     = {{TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models}},
  author    = {Veenboer, Tim and Yiasemis, George and Marcus, Eric and van Veldhuizen, Vivien and Snoek, Cees G. M. and Teuwen, Jonas and Lipman, Kevin B. W. Groot},
  booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  year      = {2026},
  pages     = {726-753},
  volume    = {315},
  url       = {https://mlanthology.org/midl/2026/veenboer2026midl-tapct/}
}