Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning

Abstract

When deployed on non-stationary data streams, foundation vision-language models require continual updates without access to past data. However, naive fine-tuning undermines their zero-shot recognition capabilities and prompt robustness. We seek a replay-free principle that preserves pre-trained cross-modal generalization under domain/prompt shifts. We introduce Prompt-Invariant CCA Certificates (Pi-CCA), a geometry-first approach that summarizes image--text alignment with a compact certificate capturing the top-k canonical spectrum and subspace. During adaptation, we match this summary using only mini-batch statistics and induce prompt robustness via averaging over perturbations. Across MTIL, X-TAIL, VLCL, and ConStruct-VL, Pi-CCA achieves state-of-the-art performance among replay-free methods. By optimizing alignment invariants rather than proxy signals, Pi-CCA provides a simple, generator-free, constant-memory path to continual adaptation with strong zero-shot retention and resilience to prompt/style shifts.

Cite

Text

Zhang et al. "Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-picca/)

BibTeX

@inproceedings{zhang2026iclr-picca,
  title     = {{Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Continual Multimodal Learning}},
  author    = {Zhang, Jiayu and Zhao, Chuangxin and Xiao, Canran and Duan, Ruibo and Mo, Wenyi and Gao, Haoyu and Wang, Wenshuo},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-picca/}
}