Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation

Abstract

Most existing transformer-based network architectures for computer vision tasks are large (in number of parameters) and require large-scale datasets for training. However, the relatively small number of data samples in medical imaging compared to the datasets for vision applications makes it difficult to effectively train transformers for medical imaging applications. Further, transformer-based architectures encode long-range dependencies in the data and are able to learn more global representations. This could bridge the gap with convolutional neural networks (CNNs), which primarily operate on features extracted in local image neighbourhoods. In this work, we present a hybrid transformer-based approach for segmentation of medical images that works in conjunction with a CNN. We propose to use learnable global attention heads along with the traditional convolutional segmentation network architecture to encode long-range dependencies. Specifically, in our proposed architecture the local information extracted by the convolution operations and the global information learned by the self-attention mechanisms are fused using bi-directional cross attention during the encoding process, resulting in what we call a hybrid ladder transformer (HyLT). We evaluate the proposed network on two different medical image segmentation datasets. The results show that it achieves better results than the relevant CNN- and transformer-based architectures

Cite

Text

Luo et al. "Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation." Medical Imaging with Deep Learning, 2023.

Markdown

[Luo et al. "Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation." Medical Imaging with Deep Learning, 2023.](https://mlanthology.org/midl/2023/luo2023midl-hybrid/)

BibTeX

@inproceedings{luo2023midl-hybrid,
  title     = {{Hybrid Ladder Transformers with Efficient Parallel-Cross Attention for Medical Image Segmentation}},
  author    = {Luo, Haozhe and Changdong, Yu and Selvan, Raghavendra},
  booktitle = {Medical Imaging with Deep Learning},
  year      = {2023},
  pages     = {808-819},
  volume    = {172},
  url       = {https://mlanthology.org/midl/2023/luo2023midl-hybrid/}
}