JAFAR: Jack up Any Feature at Any Resolution

Abstract

Foundation Vision Encoders have become indispensable across a wide range of dense vision tasks. However, their operation at low spatial feature resolutions necessitates subsequent feature decompression to enable full-resolution processing. To address this limitation, we introduce JAFAR, a lightweight and flexible feature upsampler designed to enhance the spatial resolution of visual features from any Foundation Vision Encoder to any target resolution. JAFAR features an attention-based upsampling module that aligns the spatial representations of high-resolution queries with semantically enriched low-resolution keys via Spatial Feature Transform modulation. Despite the absence of high-resolution feature ground truth; we find that learning at low upsampling ratios and resolutions generalizes surprisingly well to much higher scales. Extensive experiments demonstrate that JAFAR recovers intricate pixel-level details and consistently outperforms existing feature upsampling techniques across a diverse set of dense downstream applications.

Cite

Text

Couairon et al. "JAFAR: Jack up Any Feature at Any Resolution." Advances in Neural Information Processing Systems, 2025.

Markdown

[Couairon et al. "JAFAR: Jack up Any Feature at Any Resolution." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/couairon2025neurips-jafar/)

BibTeX

@inproceedings{couairon2025neurips-jafar,
  title     = {{JAFAR: Jack up Any Feature at Any Resolution}},
  author    = {Couairon, Paul and Chambon, Loick and Serrano, Louis and Haugeard, Jean-Emmanuel and Cord, Matthieu and Thome, Nicolas},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/couairon2025neurips-jafar/}
}