Efficient Multi-Modal Long Context Learning for Training-Free Adaptation

Abstract

Traditional approaches to adapting multi-modal large language models (MLLMs) to new tasks have relied heavily on fine-tuning. This paper introduces Efficient Multi-Modal Long Context Learning (EMLoC), a novel training-free alternative that embeds demonstration examples directly into the model input. EMLoC offers a more efficient, flexible, and scalable solution for task adaptation. Because extremely lengthy inputs introduce prohibitive computational and memory overhead, EMLoC contributes a chunk-wise compression mechanism combined with layer-wise adaptive pruning. It condenses long-context multimodal inputs into compact, task-specific memory representations. By adaptively pruning tokens at each layer under a Jensen-Shannon divergence constraint, our method achieves a dramatic reduction in inference complexity without sacrificing performance. This approach is the first to seamlessly integrate compression and pruning techniques for multi-modal long-context learning, offering a scalable and efficient solution for real-world applications. Extensive experiments on diverse vision-language benchmarks demonstrate that EMLoC achieves performance on par with or superior to naive long-context approaches. Our results highlight the potential of EMLoC as a groundbreaking framework for efficient and flexible adaptation of multi-modal models in resource-constrained environments. Codes are publicly available at https://github.com/Zehong-Ma/EMLoC.

Cite

Text

Ma et al. "Efficient Multi-Modal Long Context Learning for Training-Free Adaptation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ma et al. "Efficient Multi-Modal Long Context Learning for Training-Free Adaptation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ma2025icml-efficient-a/)

BibTeX

@inproceedings{ma2025icml-efficient-a,
  title     = {{Efficient Multi-Modal Long Context Learning for Training-Free Adaptation}},
  author    = {Ma, Zehong and Zhang, Shiliang and Wei, Longhui and Tian, Qi},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {42236-42251},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ma2025icml-efficient-a/}
}