One Question Answering Model for Many Languages with Cross-Lingual Dense Passage Retrieval

Abstract

We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific annotated data or knowledge sources.We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question.Combined with a multilingual autoregressive generation model, CORA answers directly in the target language without any translation or in-language retrieval modules as used in prior work. We propose an iterative training method that automatically extends annotated data available only in high-resource languages to low-resource ones. Our results show that CORA substantially outperforms the previous state of the art on multilingual open QA benchmarks across 26 languages, 9 of which are unseen during training. Our analyses show the significance of cross-lingual retrieval and generation in many languages, particularly under low-resource settings.

Cite

Text

Asai et al. "One Question Answering Model for Many Languages with Cross-Lingual Dense Passage Retrieval." Neural Information Processing Systems, 2021.

Markdown

[Asai et al. "One Question Answering Model for Many Languages with Cross-Lingual Dense Passage Retrieval." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/asai2021neurips-one/)

BibTeX

@inproceedings{asai2021neurips-one,
  title     = {{One Question Answering Model for Many Languages with Cross-Lingual Dense Passage Retrieval}},
  author    = {Asai, Akari and Yu, Xinyan and Kasai, Jungo and Hajishirzi, Hanna},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/asai2021neurips-one/}
}