ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Baechler, Gilles; Sunkara, Srinivas; Wang, Maria; Zubach, Fedir; Mansoor, Hassan; Etter, Vincent; Carbune, Victor; Lin, Jason; Chen, Jindong; Sharma, Abhanshu

doi:10.24963/ijcai.2024/339

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Gilles Baechler, Srinivas Sunkara, Maria Wang, Fedir Zubach, Hassan Mansoor, Vincent Etter, Victor Carbune, Jason Lin, Jindong Chen, Abhanshu Sharma

IJCAI 2024 pp. 3058-3068

doi:10.24963/ijcai.2024/339 /ijcai/2024/baechler2024ijcai-screenai/

Abstract

The k-nearest neighbor (kNN) query is a cornerstone of similarity-based applications across various domains. While prior work has enhanced kNN search efficiency, it typically focuses on approximate methods for high-dimensional data or exact methods for low-dimensional data, often assuming static query and data distributions. This creates a significant gap in accelerating exact kNN search for low-to-medium dimensional data with dynamic query distributions. To fill this gap, we propose App2Exa, a cache-guided framework that integrates approximate and exact kNN search. App2Exa utilizes a dynamically maintained cache graph index to retrieve approximate results, which subsequently guide exact search using a VP-Tree with a best-first strategy. A benefit-driven caching mechanism further optimizes performance by prioritizing vectors based on frequency, recency, and computational cost. Experimental results demonstrate that App2Exa significantly boosts efficiency, providing a robust and scalable solution for evolving query patterns and enabling exact kNN search to support higher dimensionality more effectively.

PDF IJCAI Semantic Scholar

Cite

Text

Baechler et al. "ScreenAI: A Vision-Language Model for UI and Infographics Understanding." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/339

Markdown

[Baechler et al. "ScreenAI: A Vision-Language Model for UI and Infographics Understanding." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/baechler2024ijcai-screenai/) doi:10.24963/ijcai.2024/339

BibTeX

@inproceedings{baechler2024ijcai-screenai,
  title     = {{ScreenAI: A Vision-Language Model for UI and Infographics Understanding}},
  author    = {Baechler, Gilles and Sunkara, Srinivas and Wang, Maria and Zubach, Fedir and Mansoor, Hassan and Etter, Vincent and Carbune, Victor and Lin, Jason and Chen, Jindong and Sharma, Abhanshu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3058-3068},
  doi       = {10.24963/ijcai.2024/339},
  url       = {https://mlanthology.org/ijcai/2024/baechler2024ijcai-screenai/}
}