MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension

Abstract

Foundational Multi-modal Large Language Models (MLLMs) have achieved rapid progress in handling complex tasks across diverse modalities. However, they still struggle to deliver satisfactory performance on Long-video Comprehension (LVC) tasks involving thousands of frames. Existing optimization strategies can be broadly categorized into LVC-specific fine-tuning, built-in token compression and training-free keyframe extraction, with the latter being most suitable for flexible deployment across various MLLMs. Unfortunately, current training-free approaches predominantly focus on query-frame relevance retrieval, overlooking other levels of visual information and the inherent heterogeneity of LVC tasks. In this work, we propose the $\textbf{M}$ulti-view $\textbf{I}$nformation $\textbf{R}$etrieval with $\textbf{A}$daptive Routing ($\textbf{MIRA}$) framework, which evaluates video frames using distinct metrics for relevance and causality, combines these scores to select a balanced pool of keyframes, and employs an adaptive feedback loop to tailor the retrieval process to different user queries, enabling more precise and sample-grained video comprehension. Extensive experiments demonstrate the advanced performance of our scheme across multiple challenging LVC benchmarks. For instance, integrating $\textbf{MIRA}$ with Qwen-2.5-VL yields performance gains of 3.5% to 13.1% on LVB, VideoMME and MLVU.

Cite

Text

Hao et al. "MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension." Transactions on Machine Learning Research, 2026.

Markdown

[Hao et al. "MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/hao2026tmlr-mira/)

BibTeX

@article{hao2026tmlr-mira,
  title     = {{MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension}},
  author    = {Hao, Zecheng and Ma, Wayne and Cui, Yufeng and Li, Shuang and Wang, Xinlong and Huang, Tiejun},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/hao2026tmlr-mira/}
}