ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Abstract

The k-nearest neighbor (kNN) query is a cornerstone of similarity-based applications across various domains. While prior work has enhanced kNN search efficiency, it typically focuses on approximate methods for high-dimensional data or exact methods for low-dimensional data, often assuming static query and data distributions. This creates a significant gap in accelerating exact kNN search for low-to-medium dimensional data with dynamic query distributions. To fill this gap, we propose App2Exa, a cache-guided framework that integrates approximate and exact kNN search. App2Exa utilizes a dynamically maintained cache graph index to retrieve approximate results, which subsequently guide exact search using a VP-Tree with a best-first strategy. A benefit-driven caching mechanism further optimizes performance by prioritizing vectors based on frequency, recency, and computational cost. Experimental results demonstrate that App2Exa significantly boosts efficiency, providing a robust and scalable solution for evolving query patterns and enabling exact kNN search to support higher dimensionality more effectively.

Cite

Text

Baechler et al. "ScreenAI: A Vision-Language Model for UI and Infographics Understanding." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/339

Markdown

[Baechler et al. "ScreenAI: A Vision-Language Model for UI and Infographics Understanding." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/baechler2024ijcai-screenai/) doi:10.24963/ijcai.2024/339

BibTeX

@inproceedings{baechler2024ijcai-screenai,
  title     = {{ScreenAI: A Vision-Language Model for UI and Infographics Understanding}},
  author    = {Baechler, Gilles and Sunkara, Srinivas and Wang, Maria and Zubach, Fedir and Mansoor, Hassan and Etter, Vincent and Carbune, Victor and Lin, Jason and Chen, Jindong and Sharma, Abhanshu},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3058-3068},
  doi       = {10.24963/ijcai.2024/339},
  url       = {https://mlanthology.org/ijcai/2024/baechler2024ijcai-screenai/}
}