Vision + Language Applications: A Survey

Zhou, Yutong; Shimada, Nobutaka

doi:10.1109/CVPRW59228.2023.00090

Vision + Language Applications: A Survey

Yutong Zhou, Nobutaka Shimada

CVPRW 2023 pp. 826-842

doi:10.1109/CVPRW59228.2023.00090 /cvprw/2023/zhou2023cvprw-vision/

Abstract

Text-to-image generation has attracted significant interest from researchers and practitioners in recent years due to its widespread and diverse applications across various industries. Despite the progress made in the domain of vision and language research, the existing literature remains relatively limited, particularly with regard to advancements and applications in this field. This paper explores a relevant research track within multimodal applications, including text, vision, audio, and others. In addition to the studies discussed in this paper, we are also committed to continually updating the latest relevant papers, datasets, application projects and corresponding information at https://github.com/Yutong-Zhou-cv/Awesome-Text-to-Image.

PDF CVPRW Semantic Scholar

Cite

Text

Zhou and Shimada. "Vision + Language Applications: A Survey." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00090

Markdown

[Zhou and Shimada. "Vision + Language Applications: A Survey." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/zhou2023cvprw-vision/) doi:10.1109/CVPRW59228.2023.00090

BibTeX

@inproceedings{zhou2023cvprw-vision,
  title     = {{Vision + Language Applications: A Survey}},
  author    = {Zhou, Yutong and Shimada, Nobutaka},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {826-842},
  doi       = {10.1109/CVPRW59228.2023.00090},
  url       = {https://mlanthology.org/cvprw/2023/zhou2023cvprw-vision/}
}