An Approach for Dataset Extension for Object Detection in Artworks Using Open-Vocabulary Models
Abstract
While studying objects presented in paintings, art history specialists identify their significance, symbolic meaning and historical context. Analyzing big artistic collections can be very time-consuming for the specialists. The search could be relieved by using modern object detectors. However, object detectors have poor performance on artistic images. This problem could be solved by fine-tuning them on specialized annotated datasets. In this paper, we explore the possibilities of using open-vocabulary foundation models for dataset annotation in a semi-automated manner. We propose an approach for artistic dataset annotation for object detection task based on a small set of images annotated on image-level and using Vision Transformer for Open-World Localization (OWL-ViT2) model, the YOLO object detector and an approximate nearest neighbour oh yeah (ANNOY) algorithm. We extend the existing DEArt dataset by 97.2 $\%$ % and introduce the way of adding new classes without exhaustive annotation. With the extended version of the dataset, we achieve 12.2 $\%$ % increase of mAP0.5 metric on average on the test data compared to the model trained on the original dataset.
Cite
Text
Yemelianenko et al. "An Approach for Dataset Extension for Object Detection in Artworks Using Open-Vocabulary Models." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91572-7_18Markdown
[Yemelianenko et al. "An Approach for Dataset Extension for Object Detection in Artworks Using Open-Vocabulary Models." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/yemelianenko2024eccvw-approach/) doi:10.1007/978-3-031-91572-7_18BibTeX
@inproceedings{yemelianenko2024eccvw-approach,
title = {{An Approach for Dataset Extension for Object Detection in Artworks Using Open-Vocabulary Models}},
author = {Yemelianenko, Tetiana and Tkachenko, Iuliia and Masclef, Tess and Scuturici, Mihaela and Miguet, Serge},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {295-311},
doi = {10.1007/978-3-031-91572-7_18},
url = {https://mlanthology.org/eccvw/2024/yemelianenko2024eccvw-approach/}
}