Domain Adaptive Pretraining for Multilingual Acronym Extraction

Abstract

This paper presents our findings from participating in the multilingual acronym extraction shared task SDU@AAAI-22. The task consists of acronym extraction from documents in 6 languages within scientific and legal domains. To address multilingual acronym extraction we employed BiLSTM-CRF with multilingual XLM-RoBERTa embeddings. We pretrained the XLM-RoBERTa model on the shared task corpus to further adapt XLM-RoBERTa embeddings to the shared task domain(s). Our system (team: SMR-NLP) achieved competitive performance for acronym extraction across all the languages.

Cite

Text

Yaseen and Langer. "Domain Adaptive Pretraining for Multilingual Acronym Extraction." AAAI Conference on Artificial Intelligence, 2022. doi:10.48550/arxiv.2206.15221

Markdown

[Yaseen and Langer. "Domain Adaptive Pretraining for Multilingual Acronym Extraction." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/yaseen2022aaai-domain/) doi:10.48550/arxiv.2206.15221

BibTeX

@inproceedings{yaseen2022aaai-domain,
  title     = {{Domain Adaptive Pretraining for Multilingual Acronym Extraction}},
  author    = {Yaseen, Usama and Langer, Stefan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  doi       = {10.48550/arxiv.2206.15221},
  url       = {https://mlanthology.org/aaai/2022/yaseen2022aaai-domain/}
}