Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship
Abstract
Dense retrieval has emerged as the leading approach in information retrieval, aiming to find semantically relevant documents based on natural language queries. Given that a single document can be retrieved by multiple distinct queries, existing methods aim to represent a document with multiple vectors. Each vector is aligned with a different query to model the many-to-one relationship between queries and documents. However, these multiple vector-based approaches encounter challenges such as Increased Storage, Vector Collapse, and Search Efficiency. To address these issues, we introduce the Distribution-Driven Dense Retrieval framework (DDR). Specifically, we use vectors to represent queries and distributions to represent documents. This approach not only captures the relationships between multiple queries corresponding to the same document but also avoids the need to use multiple vectors to represent the document. Furthermore, to ensure search efficiency for DDR, we propose a dot product-based computation method to calculate the similarity between documents represented by distributions and queries represented by vectors. This allows for seamless integration with existing approximate nearest neighbor (ANN) search algorithms for efficient search. Finally, we conduct extensive experiments on real-world datasets, which demonstrate that our method significantly outperforms traditional dense retrieval methods.
Cite
Text
Kang et al. "Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I11.33299Markdown
[Kang et al. "Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/kang2025aaai-distribution/) doi:10.1609/AAAI.V39I11.33299BibTeX
@inproceedings{kang2025aaai-distribution,
title = {{Distribution-Driven Dense Retrieval: Modeling Many-to-One Query-Document Relationship}},
author = {Kang, Junfeng and Li, Rui and Liu, Qi and Huang, Zhenya and Zhang, Zheng and Chen, Yanjiang and Zhu, Linbo and Su, Yu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {11933-11941},
doi = {10.1609/AAAI.V39I11.33299},
url = {https://mlanthology.org/aaai/2025/kang2025aaai-distribution/}
}