SPaSe - Multi-Label Page Segmentation for Presentation Slides

Haurilet, Monica; Al-Halah, Ziad; Stiefelhagen, Rainer

doi:10.1109/WACV.2019.00082

SPaSe - Multi-Label Page Segmentation for Presentation Slides

Monica Haurilet, Ziad Al-Halah, Rainer Stiefelhagen

WACV 2019 pp. 726-734

doi:10.1109/WACV.2019.00082 /wacv/2019/haurilet2019wacv-spase/

Abstract

We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.

WACV Semantic Scholar

Cite

Text

Haurilet et al. "SPaSe - Multi-Label Page Segmentation for Presentation Slides." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00082

Markdown

[Haurilet et al. "SPaSe - Multi-Label Page Segmentation for Presentation Slides." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/haurilet2019wacv-spase/) doi:10.1109/WACV.2019.00082

BibTeX

@inproceedings{haurilet2019wacv-spase,
  title     = {{SPaSe - Multi-Label Page Segmentation for Presentation Slides}},
  author    = {Haurilet, Monica and Al-Halah, Ziad and Stiefelhagen, Rainer},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2019},
  pages     = {726-734},
  doi       = {10.1109/WACV.2019.00082},
  url       = {https://mlanthology.org/wacv/2019/haurilet2019wacv-spase/}
}