DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Xin Yue Li, Jeffrey Bigham, Amy Pavel

ECCV 2024

doi:10.1007/978-3-031-72691-0_26 /eccv/2024/peng2024eccv-dreamstruct/

Abstract

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

PDF ECCV Semantic Scholar

Cite

Text

Peng et al. "DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72691-0_26

Markdown

[Peng et al. "DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/peng2024eccv-dreamstruct/) doi:10.1007/978-3-031-72691-0_26

BibTeX

@inproceedings{peng2024eccv-dreamstruct,
  title     = {{DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation}},
  author    = {Peng, Yi-Hao and Huq, Faria and Jiang, Yue and Wu, Jason and Li, Xin Yue and Bigham, Jeffrey and Pavel, Amy},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72691-0_26},
  url       = {https://mlanthology.org/eccv/2024/peng2024eccv-dreamstruct/}
}