Video Scene Segmentation with Genre and Duration Signals

Abstract

Video scene segmentation aims to detect semantically coherent boundaries in long-form videos, bridging the gap between low-level visual signals and high-level narrative understanding. However, existing methods primarily rely on visual similarity between adjacent shots, which makes it difficult to accurately identify scene boundaries, especially when semantic transitions do not align with visual changes. In this paper, we propose a novel approach that incorporates production-level metadata, specifically genre conventions and shot duration patterns, into video scene segmentation. Our main contributions are three-fold: (1) we leverage textual genre definitions as semantic priors to guide shot-level representation learning during self-supervised pretraining, enabling better capture of narrative coherence; (2) we introduce a duration-aware anchor selection strategy that prioritizes shorter shots based on empirical duration statistics, improving pseudo-boundary generation quality; (3) we propose a test-time shot splitting strategy that subdivides long shots into segments for improved temporal modeling. Experimental results demonstrate state-of-the-art performance on MovieNet-SSeg and BBC datasets. We introduce MovieChat-SSeg, extending MovieChat-1K with manually annotated scene boundaries across 1,000 videos spanning movies, TV series, and documentaries.

Cite

Text

Cho et al. "Video Scene Segmentation with Genre and Duration Signals." International Conference on Learning Representations, 2026.

Markdown

[Cho et al. "Video Scene Segmentation with Genre and Duration Signals." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/cho2026iclr-video/)

BibTeX

@inproceedings{cho2026iclr-video,
  title     = {{Video Scene Segmentation with Genre and Duration Signals}},
  author    = {Cho, Jungu and Ha, Seong Jong and Jeon, Hae-Gon},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/cho2026iclr-video/}
}