MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension
Abstract
Foundational Multi-modal Large Language Models (MLLMs) have achieved rapid progress in handling complex tasks across diverse modalities. However, they still struggle to deliver satisfactory performance on Long-video Comprehension (LVC) tasks involving thousands of frames. Existing optimization strategies can be broadly categorized into LVC-specific fine-tuning, built-in token compression and training-free keyframe extraction, with the latter being most suitable for flexible deployment across various MLLMs. Unfortunately, current training-free approaches predominantly focus on query-frame relevance retrieval, overlooking other levels of visual information and the inherent heterogeneity of LVC tasks. In this work, we propose the $\textbf{M}$ulti-view $\textbf{I}$nformation $\textbf{R}$etrieval with $\textbf{A}$daptive Routing ($\textbf{MIRA}$) framework, which evaluates video frames using distinct metrics for relevance and causality, combines these scores to select a balanced pool of keyframes, and employs an adaptive feedback loop to tailor the retrieval process to different user queries, enabling more precise and sample-grained video comprehension. Extensive experiments demonstrate the advanced performance of our scheme across multiple challenging LVC benchmarks. For instance, integrating $\textbf{MIRA}$ with Qwen-2.5-VL yields performance gains of 3.5% to 13.1% on LVB, VideoMME and MLVU.
Cite
Text
Hao et al. "MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension." Transactions on Machine Learning Research, 2026.Markdown
[Hao et al. "MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/hao2026tmlr-mira/)BibTeX
@article{hao2026tmlr-mira,
title = {{MIRA: Multi-View Information Retrieval with Adaptive Routing for Test-Time Long-Video Comprehension}},
author = {Hao, Zecheng and Ma, Wayne and Cui, Yufeng and Li, Shuang and Wang, Xinlong and Huang, Tiejun},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/hao2026tmlr-mira/}
}