Unseen and Adverse Outdoor Scenes Recognition Through Event-Based Captions
Abstract
This paper presents EventCAP, i.e., event-based captions, for refined and enriched qualitative and quantitative captions by Deep Learning (DL) models and Vision Language Models (VLMs) with different tasks in a complementary manner. Indoor and outdoor images are used for object recognition and captioning. However, outdoor images in events change in wide ranges due to natural phenomena, i.e., weather changes. Such dynamical changes may degrade segmentation by illumination and object shape changes. This increases unseen objects and scenes under such adverse conditions. On the other hand, single state-of-art (SOTA) DLs and VLMs work with single or limited tasks, Therefore, this paper proposes EventCAP with captions with physical scales and objects’ surface properties. Moreover, an iterative VQA model is proposed to refine in-complete segmented images with the prompts. A higher se-mantic level in captions for real-world scene descriptions is experimentally shown compared to SOTA VLMs.
Cite
Text
Sakaino. "Unseen and Adverse Outdoor Scenes Recognition Through Event-Based Captions." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00387Markdown
[Sakaino. "Unseen and Adverse Outdoor Scenes Recognition Through Event-Based Captions." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/sakaino2023iccvw-unseen/) doi:10.1109/ICCVW60793.2023.00387BibTeX
@inproceedings{sakaino2023iccvw-unseen,
title = {{Unseen and Adverse Outdoor Scenes Recognition Through Event-Based Captions}},
author = {Sakaino, Hidetomo},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2023},
pages = {3596-3603},
doi = {10.1109/ICCVW60793.2023.00387},
url = {https://mlanthology.org/iccvw/2023/sakaino2023iccvw-unseen/}
}