Semantic Segmentation Using Foundation Models for Cultural Heritage: An Experimental Study on Notre-Dame De Paris

Abstract

The zero-shot performance of foundation models has captured a lot of attention. Specifically, the Segment Anything Model (SAM) has gained popularity in computer vision due to its label-free segmentation capabilities. Our study proposes using SAM on cultural heritage data, specifically images of Notre-Dame de Paris, with a controlled vocabulary. SAM can successfully identify objects within the cathedral. To further improve segmentation, we utilized Grounding DINO to detect objects and CLIP to automatically add labels from the segmentation masks generated by SAM. Our study demonstrates the usefulness of foundation models for zero-shot semantic segmentation of cultural heritage data.

Cite

Text

Réby et al. "Semantic Segmentation Using Foundation Models for Cultural Heritage: An Experimental Study on Notre-Dame De Paris." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00184

Markdown

[Réby et al. "Semantic Segmentation Using Foundation Models for Cultural Heritage: An Experimental Study on Notre-Dame De Paris." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/reby2023iccvw-semantic/) doi:10.1109/ICCVW60793.2023.00184

BibTeX

@inproceedings{reby2023iccvw-semantic,
  title     = {{Semantic Segmentation Using Foundation Models for Cultural Heritage: An Experimental Study on Notre-Dame De Paris}},
  author    = {Réby, Kévin and Guilhelm, Anaïs and De Luca, Livio},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {1681-1689},
  doi       = {10.1109/ICCVW60793.2023.00184},
  url       = {https://mlanthology.org/iccvw/2023/reby2023iccvw-semantic/}
}