Generating Construction Safety Observations via CLIP-Based Image-Language Embedding

Tsai, Wei Lun; Lin, Jacob J.; Hsieh, Shang-Hsien

doi:10.1007/978-3-031-25082-8_24

Generating Construction Safety Observations via CLIP-Based Image-Language Embedding

Wei Lun Tsai, Jacob J. Lin, Shang-Hsien Hsieh

ECCVW 2022 pp. 366-381

doi:10.1007/978-3-031-25082-8_24 /eccvw/2022/tsai2022eccvw-generating/

Abstract

Safety inspections are standard practices to prevent accidents from happening on construction sites. Traditional workflows require an inspector to document the violations through photos and textual descriptions explaining the specific incident with the objects, actions, and context. However, the documentation process is time-consuming, and the content is inconsistent. The same violation could be captioned in various ways, making the safety analysis tricky. Research has investigated means to improve the documentation process efficiency through applications with standardized forms and develop language understanding models to analyze the safety reports. Nevertheless, it is still challenging to streamline the entire documentation process and accurately compile the reports into meaningful information. We propose an image-language embedding model that automatically generates textual safety observations through the Contrastive Language-Image Pre-trained (CLIP) fine-tuning and CLIP prefix captioning designed based on the construction safety context. CLIP can obtain the contrastive features to classify the safety attribute types for images, and CLIP prefix captioning generates the caption from the given safety attributes, images, and captions. The framework is evaluated through a construction safety report dataset and could create reasonable textual information for safety inspectors.

PDF ECCVW Semantic Scholar

Cite

Text

Tsai et al. "Generating Construction Safety Observations via CLIP-Based Image-Language Embedding." European Conference on Computer Vision Workshops, 2022. doi:10.1007/978-3-031-25082-8_24

Markdown

[Tsai et al. "Generating Construction Safety Observations via CLIP-Based Image-Language Embedding." European Conference on Computer Vision Workshops, 2022.](https://mlanthology.org/eccvw/2022/tsai2022eccvw-generating/) doi:10.1007/978-3-031-25082-8_24

BibTeX

@inproceedings{tsai2022eccvw-generating,
  title     = {{Generating Construction Safety Observations via CLIP-Based Image-Language Embedding}},
  author    = {Tsai, Wei Lun and Lin, Jacob J. and Hsieh, Shang-Hsien},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2022},
  pages     = {366-381},
  doi       = {10.1007/978-3-031-25082-8_24},
  url       = {https://mlanthology.org/eccvw/2022/tsai2022eccvw-generating/}
}