AI-Based Video Content Understanding for Automatic and Interactive Multimedia Retrieval

Abstract

We present diveXplore, a distributed system for AI-based video content understanding and retrieval, which will be used in the interactive task of the IViSE 2025 workshop. The system combines state-of-the-art deep learning components for shot segmentation, text and speech recognition, and multimodal embeddings with a scalable architecture designed for efficient storage, querying, and user interaction. A key feature of the frontend is an intuitive web-based GUI that supports free-text and semantic search, video summarization, and temporal query composition. We evaluate the performance of a newly developed keyframe scrubbing feature and conduct a qualitative user experiment based on all IViSE 2025 KIS tasks. The results demonstrate the system's effectiveness in interactive video retrieval and inform a set of improvements for future versions.

Cite

Text

Schoeffmann and Leopold. "AI-Based Video Content Understanding for Automatic and Interactive Multimedia Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Schoeffmann and Leopold. "AI-Based Video Content Understanding for Automatic and Interactive Multimedia Retrieval." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/schoeffmann2025cvprw-aibased/)

BibTeX

@inproceedings{schoeffmann2025cvprw-aibased,
  title     = {{AI-Based Video Content Understanding for Automatic and Interactive Multimedia Retrieval}},
  author    = {Schoeffmann, Klaus and Leopold, Mario},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {3750-3758},
  url       = {https://mlanthology.org/cvprw/2025/schoeffmann2025cvprw-aibased/}
}