SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
Abstract
The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and pyramid designs remain a key factor for their empirical success. In this paper, we show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector ‘SimPLR’ whose backbone and detection head are both non-hierarchical and operate on single-scale features. We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives. Compared to the multi-scale and single-scale state-of-the-art, our model scales better with bigger capacity (self-supervised) models and more pre-training data, allowing us to report a consistently better accuracy and faster runtime for object detection, instance segmentation, as well as panoptic segmentation. Code is released at \url{https://github.com/kienduynguyen/SimPLR}.
Cite
Text
Nguyen et al. "SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation." Transactions on Machine Learning Research, 2025.Markdown
[Nguyen et al. "SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/nguyen2025tmlr-simplr/)BibTeX
@article{nguyen2025tmlr-simplr,
title = {{SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation}},
author = {Nguyen, Duy Kien and Oswald, Martin R. and Snoek, Cees G. M.},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/nguyen2025tmlr-simplr/}
}