RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing

Luo, Ruikun; Wang, Jiarui; Gao, Yuan; Yang, Jing; Yang, Jieming; Wu, Song; Jin, Hai; Xia, Xiaoyu

RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing

Ruikun Luo, Jiarui Wang, Yuan Gao, Jing Yang, Jieming Yang, Song Wu, Hai Jin, Xiaoyu Xia

ICLR 2026

/iclr/2026/luo2026iclr-rlapclip/

Abstract

Vision-language models, such as CLIP, achieve strong zero-shot performance through contrastive pre-training but face significant challenges in class-incremental image classification scenarios. When learning new tasks sequentially, current methods suffer from degradation in prototype quality due to passive averaging and underutilize their visual adaptation capabilities. We propose RLAP-CLIP, which addresses these limitations through three components. First, Reinforcement Learning-based Prototype Optimization (RLPO) formulates prototype construction as a reinforcement learning problem to actively optimize class separability rather than relying on simple averaging. Second, difficulty-aware cross-modal fusion uses a mixture-of-experts to route samples through specialized processing pathways based on complexity. Third, dual-modal prompting balances visual and textual adaptation. Experiments on eight image classification benchmarks demonstrate consistent improvements, with RLAP-CLIP achieving average accuracy gains of 3.72-4.46 points and final accuracy improvements of 0.49-4.48 points over other methods, validating that RLAP-CLIP achieves state-of-the-art performance.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Luo et al. "RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing." International Conference on Learning Representations, 2026.

Markdown

[Luo et al. "RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/luo2026iclr-rlapclip/)

BibTeX

@inproceedings{luo2026iclr-rlapclip,
  title     = {{RLAP-CLIP: Continual Multimodal Learning with Prototype Adaptation and Difficulty-Aware Routing}},
  author    = {Luo, Ruikun and Wang, Jiarui and Gao, Yuan and Yang, Jing and Yang, Jieming and Wu, Song and Jin, Hai and Xia, Xiaoyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/luo2026iclr-rlapclip/}
}