Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

Mengyu Zheng, Yehui Tang, Zhiwei Hao, Kai Han, Yunhe Wang, Chang Xu

ECCV 2024

doi:10.1007/978-3-031-72949-2_7 /eccv/2024/zheng2024eccv-adapt/

Abstract

Multi-modal models such as CLIP possess remarkable zero-shot transfer capabilities, making them highly effective in continual learning tasks. However, this advantage is severely compromised by catastrophic forgetting, which undermines the valuable zero-shot learning abilities of these models. Existing methods predominantly focus on preserving zero-shot capabilities but often fall short in fully exploiting the rich modal information inherent in multi-modal models. In this paper, we propose a strategy to enhance both the zero-shot transfer ability and adaptability to new data distribution. We introduce a novel graph-based multi-modal proximity distillation approach that preserves the intra- and inter-modal information for visual and textual modalities. This approach is further enhanced with a sample re-weighting mechanism, dynamically adjusting the influence of teachers for each individual sample. Experimental results demonstrate a considerable improvement over existing methodologies, which illustrate the effectiveness of the proposed method in the field of continual learning. Code is available at github.com/myz-ah/AwoForget.

PDF ECCV Semantic Scholar

Cite

Text

Zheng et al. "Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72949-2_7

Markdown

[Zheng et al. "Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zheng2024eccv-adapt/) doi:10.1007/978-3-031-72949-2_7

BibTeX

@inproceedings{zheng2024eccv-adapt,
  title     = {{Adapt Without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models}},
  author    = {Zheng, Mengyu and Tang, Yehui and Hao, Zhiwei and Han, Kai and Wang, Yunhe and Xu, Chang},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72949-2_7},
  url       = {https://mlanthology.org/eccv/2024/zheng2024eccv-adapt/}
}