GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation
Abstract
Incremental object detection (IOD) is a challenging task that requires detection models to continuously learn from newly arriving data. This work focuses on incremental learning for vision-language detectors (VLDs), an under explored domain. Existing research typically adopts a local alignment paradigm to avoid label conflicts, where different tasks are learned separately without interaction. However, we reveal that this practice fails to effectively preserve the semantic structure. Specifically, aligned relationships between objects and texts would collapse when handling novel categories, ultimately leading to catastrophic forgetting. Though knowledge distillation (KD) is a common approach for tackling this, traditional KD performs poorly when directly applied to VLDs, as for different phases, a natural knowledge gap exists in both encoding and decoding processes. To address above issues, we propose a novel method called Global alignment and Correspondence Distillation (GCD). Differently, we first integrate knowledge across phases within the same embedding space to construct global semantic structure. We then enable effective knowledge distillation in VLDs through a semantic correspondence mechanism, ensuring consistent proposal generation and decoding. On the top of that, we distill teacher model’s informative predictions and topological relationships to maintain stable local semantic structure. Extensive experiments on COCO 2017 demonstrate that our method significantly outperforms existing approaches, achieving new state-of-the-art in various IOD scenarios.
Cite
Text
Wang et al. "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32864Markdown
[Wang et al. "GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-gcd/) doi:10.1609/AAAI.V39I8.32864BibTeX
@inproceedings{wang2025aaai-gcd,
title = {{GCD: Advancing Vision-Language Models for Incremental Object Detection via Global Alignment and Correspondence Distillation}},
author = {Wang, Xu and Wang, Zilei and Lin, Zihan},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {8015-8023},
doi = {10.1609/AAAI.V39I8.32864},
url = {https://mlanthology.org/aaai/2025/wang2025aaai-gcd/}
}