Bi-Encoder Cascades for Efficient Image Search
Abstract
Modern neural encoders offer unprecedented text-image retrieval (TIR) accuracy, but their high computational cost impedes an adoption to large-scale image searches. To lower this cost, model cascades use an expensive encoder to refine the ranking of a cheap encoder. However, existing cascading algorithms focus on cross-encoders, which jointly process text-image pairs, but do not consider cascades of bi-encoders, which separately process texts and images. We introduce the small-world search scenario as a realistic setting where bi-encoder cascades can reduce costs. We then propose a cascading algorithm that leverages the small-world search scenario to reduce lifetime image encoding costs of a TIR system. Our experiments show cost reductions by up to 6x.
Cite
Text
Hönig et al. "Bi-Encoder Cascades for Efficient Image Search." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00146Markdown
[Hönig et al. "Bi-Encoder Cascades for Efficient Image Search." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/honig2023iccvw-biencoder/) doi:10.1109/ICCVW60793.2023.00146BibTeX
@inproceedings{honig2023iccvw-biencoder,
title = {{Bi-Encoder Cascades for Efficient Image Search}},
author = {Hönig, Robert and Ackermann, Jan and Chi, Mingyuan},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2023},
pages = {1350-1355},
doi = {10.1109/ICCVW60793.2023.00146},
url = {https://mlanthology.org/iccvw/2023/honig2023iccvw-biencoder/}
}