Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling
Abstract
This paper studies multi-task training of retrieval-augmented generation models for knowledge-intensive tasks. We propose to clean the training set by utilizing a distinct property of knowledge-intensive generation: The connection of query-answer pairs to items in the knowledge base. We filter training examples via a threshold of confidence on the relevance labels, whether a pair is answerable by the knowledge base or not.We train a single Fusion-in-Decoder (FiD) generator on seven combined tasks of the KILT benchmark. The experimental results suggest that our simple yet effective approach substantially improves competitive baselines on two strongly imbalanced tasks; and shows either smaller improvements or no significant regression on the remaining tasks. Furthermore, we demonstrate our multi-task training with relevance label sampling scales well with increased model capacity and achieves state-of-the-art results in five out of seven KILT tasks.
Cite
Text
Hofstätter et al. "Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling." ICML 2022 Workshops: KRLM, 2022.Markdown
[Hofstätter et al. "Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling." ICML 2022 Workshops: KRLM, 2022.](https://mlanthology.org/icmlw/2022/hofstatter2022icmlw-multitask/)BibTeX
@inproceedings{hofstatter2022icmlw-multitask,
title = {{Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling}},
author = {Hofstätter, Sebastian and Chen, Jiecao and Raman, Karthik and Zamani, Hamed},
booktitle = {ICML 2022 Workshops: KRLM},
year = {2022},
url = {https://mlanthology.org/icmlw/2022/hofstatter2022icmlw-multitask/}
}