Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking

Tran, Huu-Loc; Nguyen-Nhu, Tinh-Anh; Phan-Nguyen, Huu-Phong; Nguyen, Tien-Huy; Nguyen-Dich, Nhat-Minh; Dao, Anh; Do, Huy-Duc; Nguyen, Quan; Le, Hoang M.; Dinh, Quang-Vinh

Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking

Huu-Loc Tran, Tinh-Anh Nguyen-Nhu, Huu-Phong Phan-Nguyen, Tien-Huy Nguyen, Nhat-Minh Nguyen-Dich, Anh Dao, Huy-Duc Do, Quan Nguyen, Hoang M. Le, Quang-Vinh Dinh

CVPRW 2025 pp. 3719-3729

/cvprw/2025/tran2025cvprw-efficient/

Abstract

Long-form video understanding presents significant challenges for interactive retrieval systems, as conventional methods struggle to process extensive video content efficiently. Existing approaches often rely on single models, inefficient storage, unstable temporal search, and context-agnostic reranking, limiting their effectiveness. This paper presents a novel framework to enhance interactive video retrieval through four key innovations: (1) an ensemble search strategy that integrates coarse-grained (CLIP) and fine-grained (BEIT3) models to improve retrieval accuracy, (2) a storage optimization technique that reduces redundancy by selecting representative keyframes via TransNetV2 and deduplication, (3) a temporal search mechanism that localizes video segments using dual queries for start and end points, and (4) a temporal reranking approach that leverages neighboring frame context to stabilize rankings. Evaluated on known-item search and question-answering tasks, our framework demonstrates substantial improvements in retrieval precision, efficiency, and user interpretability, offering a robust solution for real-world interactive video retrieval applications.

PDF CVPRW Semantic Scholar

Cite

Text

Tran et al. "Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Tran et al. "Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/tran2025cvprw-efficient/)

BibTeX

@inproceedings{tran2025cvprw-efficient,
  title     = {{Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking}},
  author    = {Tran, Huu-Loc and Nguyen-Nhu, Tinh-Anh and Phan-Nguyen, Huu-Phong and Nguyen, Tien-Huy and Nguyen-Dich, Nhat-Minh and Dao, Anh and Do, Huy-Duc and Nguyen, Quan and Le, Hoang M. and Dinh, Quang-Vinh},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3719-3729},
  url       = {https://mlanthology.org/cvprw/2025/tran2025cvprw-efficient/}
}