InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers
Abstract
We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which—as we found out—produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. https://github.com/searchivarius/inpars_light/
Cite
Text
Boytsov et al. "InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers." Transactions on Machine Learning Research, 2024.Markdown
[Boytsov et al. "InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/boytsov2024tmlr-inparslight/)BibTeX
@article{boytsov2024tmlr-inparslight,
title = {{InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers}},
author = {Boytsov, Leonid and Patel, Preksha and Sourabh, Vivek and Nisar, Riddhi and Kundu, Sayani and Ramanathan, Ramya and Nyberg, Eric},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/boytsov2024tmlr-inparslight/}
}