SMILe: Leveraging Submodular Mutual Information for Robust Few-Shot Object Detection

Abstract

Confusion and forgetting of object classes have been challenges of prime interest in Few-Shot Object Detection (FSOD). To overcome these pitfalls in metric learning based FSOD techniques, we introduce a novel Submodular Mutual Information Learning ( 1 ) framework for loss functions which adopts combinatorial mutual information functions as learning objectives to enforce learning of well-separated feature clusters between the base and novel classes. Additionally, the joint objective in minimizes the total submodular information contained in a class leading to discriminative feature clusters. The combined effect of this joint objective demonstrates significant improvements in class confusion and forgetting in FSOD. Further we show that generalizes to several existing approaches in FSOD, improving their performance, agnostic of the backbone architecture. Experiments on popular FSOD benchmarks, PASCAL-VOC and MS-COCO show that our approach generalizes to State-of-the-Art (SoTA) approaches improving their novel class performance by up to 5.7% (3.3 mAP points) and 5.4% (2.6 mAP points) on the 10-shot setting of VOC (split 3) and 30-shot setting of COCO datasets respectively. Our experiments also demonstrate better retention of base class performance and up to 2× faster convergence over existing approaches agnostic of the underlying architecture. 1 Project page: https://anaymajee.me/assets/project_pages/smile.html.

Cite

Text

Majee et al. "SMILe: Leveraging Submodular Mutual Information for Robust Few-Shot Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73411-3_20

Markdown

[Majee et al. "SMILe: Leveraging Submodular Mutual Information for Robust Few-Shot Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/majee2024eccv-smile/) doi:10.1007/978-3-031-73411-3_20

BibTeX

@inproceedings{majee2024eccv-smile,
  title     = {{SMILe: Leveraging Submodular Mutual Information for Robust Few-Shot Object Detection}},
  author    = {Majee, Anay and Sharp, Ryan X and Iyer, Rishabh},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73411-3_20},
  url       = {https://mlanthology.org/eccv/2024/majee2024eccv-smile/}
}