Fourier Image Transformer

CVPRW 2022 pp. 1845-1853

doi:10.1109/CVPRW56347.2022.00201 /cvprw/2022/buchholz2022cvprw-fourier/

Abstract

Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation, where each prefix of the complete sequence describes the whole image at reduced resolution. Using such Fourier Do-main Encodings (FDEs), an auto-regressive image completion task is equivalent to predicting a higher resolution out-put given a low-resolution input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction. In summary, we show that Fourier Image Trans-former (FIT) can be used to solve relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.

PDF CVPRW Semantic Scholar

Cite

Text

Buchholz and Jug. "Fourier Image Transformer." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00201

Markdown

[Buchholz and Jug. "Fourier Image Transformer." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/buchholz2022cvprw-fourier/) doi:10.1109/CVPRW56347.2022.00201

BibTeX

@inproceedings{buchholz2022cvprw-fourier,
  title     = {{Fourier Image Transformer}},
  author    = {Buchholz, Tim-Oliver and Jug, Florian},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {1845-1853},
  doi       = {10.1109/CVPRW56347.2022.00201},
  url       = {https://mlanthology.org/cvprw/2022/buchholz2022cvprw-fourier/}
}