SPaSe - Multi-Label Page Segmentation for Presentation Slides
Abstract
We introduce the first benchmark dataset for slide-page segmentation. Presentation slides are one of the most prominent document types used to exchange ideas across the web, educational institutes and businesses. This document format is marked with a complex layout which contains a rich variety of graphical (e.g. diagram, logo), textual (e.g. heading, affiliation) and structural components (e.g. enumeration, legend). This vast and popular knowledge source is still unattainable by modern machine learning technique due to lack of annotated data. To tackle this issue, we introduce SPaSe (Slide Page Segmentation), a novel dataset containing in total 2000 slides with dense, pixel-wise annotations of 25 classes. We show that slide segmentation reveals some interesting properties that characterize this task. Unlike the common image segmentation problem, disjoint classes tend to have a high overlap of regions, thus posing this segmentation task as a multi-label problem. Furthermore, many of the frequently encountered classes in slides are location sensitive (e.g. title, footnote). Hence, we believe our dataset represents a challenging and interesting benchmark for novel segmentation models. Finally, we evaluate state-of-the-art deep segmentation models on our dataset and show that it is suitable for developing deep learning models without any need of pre-training. Our dataset will be released to the public to foster further research on this interesting task.
Cite
Text
Haurilet et al. "SPaSe - Multi-Label Page Segmentation for Presentation Slides." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00082Markdown
[Haurilet et al. "SPaSe - Multi-Label Page Segmentation for Presentation Slides." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/haurilet2019wacv-spase/) doi:10.1109/WACV.2019.00082BibTeX
@inproceedings{haurilet2019wacv-spase,
title = {{SPaSe - Multi-Label Page Segmentation for Presentation Slides}},
author = {Haurilet, Monica and Al-Halah, Ziad and Stiefelhagen, Rainer},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2019},
pages = {726-734},
doi = {10.1109/WACV.2019.00082},
url = {https://mlanthology.org/wacv/2019/haurilet2019wacv-spase/}
}