TriSampler: A Better Negative Sampling Principle for Dense Retrieval

Abstract

Negative sampling stands as a pivotal technique in dense retrieval, essential for training effective retrieval models and significantly impacting retrieval performance. While existing negative sampling methods have made commendable progress by leveraging hard negatives, a comprehensive guiding principle for constructing negative candidates and designing negative sampling distributions is still lacking. To bridge this gap, we embark on a theoretical analysis of negative sampling in dense retrieval. This exploration culminates in the unveiling of the quasi-triangular principle, a novel framework that elucidates the triangular-like interplay between query, positive document, and negative document. Fueled by this guiding principle, we introduce TriSampler, a straightforward yet highly effective negative sampling method. The keypoint of TriSampler lies in its ability to selectively sample more informative negatives within a prescribed constrained region. Experimental evaluation show that TriSampler consistently attains superior retrieval performance across a diverse of representative retrieval models.

Cite

Text

Yang et al. "TriSampler: A Better Negative Sampling Principle for Dense Retrieval." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I8.28779

Markdown

[Yang et al. "TriSampler: A Better Negative Sampling Principle for Dense Retrieval." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/yang2024aaai-trisampler/) doi:10.1609/AAAI.V38I8.28779

BibTeX

@inproceedings{yang2024aaai-trisampler,
  title     = {{TriSampler: A Better Negative Sampling Principle for Dense Retrieval}},
  author    = {Yang, Zhen and Shao, Zhou and Dong, Yuxiao and Tang, Jie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {9269-9277},
  doi       = {10.1609/AAAI.V38I8.28779},
  url       = {https://mlanthology.org/aaai/2024/yang2024aaai-trisampler/}
}