MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Abstract

Text-to-SQL semantic parsing is an important NLP task, which facilitates the interaction between users and the database. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL semantic parsing dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under various settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

Cite

Text

Dou et al. "MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I11.26499

Markdown

[Dou et al. "MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/dou2023aaai-multispider/) doi:10.1609/AAAI.V37I11.26499

BibTeX

@inproceedings{dou2023aaai-multispider,
  title     = {{MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing}},
  author    = {Dou, Longxu and Gao, Yan and Pan, Mingyang and Wang, Dingzirui and Che, Wanxiang and Zhan, Dechen and Lou, Jian-Guang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {12745-12753},
  doi       = {10.1609/AAAI.V37I11.26499},
  url       = {https://mlanthology.org/aaai/2023/dou2023aaai-multispider/}
}