ML-Decoder: Scalable and Versatile Classification Head

Abstract

In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.1% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation. Public code will be available.

Cite

Text

Ridnik et al. "ML-Decoder: Scalable and Versatile Classification Head." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Ridnik et al. "ML-Decoder: Scalable and Versatile Classification Head." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/ridnik2023wacv-mldecoder/)

BibTeX

@inproceedings{ridnik2023wacv-mldecoder,
  title     = {{ML-Decoder: Scalable and Versatile Classification Head}},
  author    = {Ridnik, Tal and Sharir, Gilad and Ben-Cohen, Avi and Ben-Baruch, Emanuel and Noy, Asaf},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {32-41},
  url       = {https://mlanthology.org/wacv/2023/ridnik2023wacv-mldecoder/}
}