CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval

Abstract

Cross-lingual information retrieval in long-context settings faces challenges such as the "lost-in-the-middle" phenomenon and computational inefficiencies. We introduce CROSS (Cross-lingual Retrieval Optimized for Scalable Solutions), a two-phase retrieval framework that integrates multilingual embeddings with efficient candidate selection to enhance retrieval-augmented generation (RAG). Evaluating CROSS on the newly developed mLongRR-V2 benchmark—covering seven languages and 49 language pairs—we demonstrate substantial improvements in retrieval accuracy, scalability to 512,000-token contexts, and robustness across linguistic structures. Compared to baseline large language models (LLMs), CROSS significantly mitigates mid-context retrieval failures while reducing computational overhead. Our results establish CROSS as an efficient and scalable solution for multilingual long-context retrieval.

Cite

Text

Nezhad and Agrawal. "CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval." ICLR 2025 Workshops: FM-Wild, 2025.

Markdown

[Nezhad and Agrawal. "CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/nezhad2025iclrw-cross/)

BibTeX

@inproceedings{nezhad2025iclrw-cross,
  title     = {{CROSS: Analyzing the Trade-Offs in Long-Context Cross-Lingual Retrieval}},
  author    = {Nezhad, Sina Bagheri and Agrawal, Ameeta},
  booktitle = {ICLR 2025 Workshops: FM-Wild},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/nezhad2025iclrw-cross/}
}