RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing
Abstract
Vision-language models, such as CLIP, achieve strong zero-shot performance through contrastive pre-training but face significant challenges in class-incremental image classification scenarios. When learning new tasks sequentially, current methods suffer from degradation in prototype quality due to passive averaging and underutilize their visual adaptation capabilities. We propose RLAP-CLIP, which addresses these limitations through three components. First, Reinforcement Learning-based Prototype Optimization (RLPO) formulates prototype construction as a reinforcement learning problem to actively optimize class separability rather than relying on simple averaging. Second, difficulty-aware cross-modal fusion uses a mixture-of-experts to route samples through specialized processing pathways based on complexity. Third, dual-modal prompting balances visual and textual adaptation. Experiments on eight image classification benchmarks demonstrate consistent improvements, with RLAP-CLIP achieving average accuracy gains of 3.72-4.46 points and final accuracy improvements of 0.49-4.48 points over other methods, validating that RLAP-CLIP achieves state-of-the-art performance.
Cite
Text
Luo et al. "RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing." International Conference on Learning Representations, 2026.Markdown
[Luo et al. "RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/luo2026iclr-rlapclip/)BibTeX
@inproceedings{luo2026iclr-rlapclip,
title = {{RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing}},
author = {Luo, Ruikun and Wang, Jiarui and Gao, Yuan and Yang, Jing and Yang, Jieming and Wu, Song and Jin, Hai and Xia, Xiaoyu},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/luo2026iclr-rlapclip/}
}