Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling

Abstract

This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not.We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.

Cite

Text

Hofstätter et al. "Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling." ICML 2022 Workshops: KRLM, 2022.

Markdown

[Hofstätter et al. "Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling." ICML 2022 Workshops: KRLM, 2022.](https://mlanthology.org/icmlw/2022/hofstatter2022icmlw-multitask/)

BibTeX

@inproceedings{hofstatter2022icmlw-multitask,
  title     = {{Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling}},
  author    = {Hofstätter, Sebastian and Chen, Jiecao and Raman, Karthik and Zamani, Hamed},
  booktitle = {ICML 2022 Workshops: KRLM},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/hofstatter2022icmlw-multitask/}
}