CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval
Abstract
Cross-lingual information retrieval in long-context settings faces challenges such as the "lost-in-the-middle" phenomenon and computational inefficiencies. We introduce CROSS (Cross-lingual Retrieval Optimized for Scalable Solutions), a two-phase retrieval framework that integrates multilingual embeddings with efficient candidate selection to enhance retrieval-augmented generation (RAG). Evaluating CROSS on the newly developed mLongRR-V2 benchmark—covering seven languages and 49 language pairs—we demonstrate substantial improvements in retrieval accuracy, scalability to 512,000-token contexts, and robustness across linguistic structures. Compared to baseline large language models (LLMs), CROSS significantly mitigates mid-context retrieval failures while reducing computational overhead. Our results establish CROSS as an efficient and scalable solution for multilingual long-context retrieval.
Cite
Text
Nezhad and Agrawal. "CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval." ICLR 2025 Workshops: FM-Wild, 2025.Markdown
[Nezhad and Agrawal. "CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/nezhad2025iclrw-cross/)BibTeX
@inproceedings{nezhad2025iclrw-cross,
title = {{CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval}},
author = {Nezhad, Sina Bagheri and Agrawal, Ameeta},
booktitle = {ICLR 2025 Workshops: FM-Wild},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/nezhad2025iclrw-cross/}
}