Cross-Domain Constituency Parsing by Leveraging Heterogeneous Data

Abstract

Knowledge transfer is investigated in various natural language processing tasks except cross-domain constituency parsing. In this paper, we leverage heterogeneous data to transfer cross-domain and cross-task knowledge to constituency parsing. Concretely, we first select language modeling, named entity recognition, CCG supertagging and dependency parsing as auxiliary tasks and collect the corpora of these tasks covering various domains as cross-domain and cross-task heterogeneous data. Second, we exploit three types of prefixes: shared, task and domain prefix, to merge cross-domain and cross-task data and decompose the general, task and domain representation in the pretrained language model. Third, we convert the data formats of multi-source heterogeneous datasets and loss objectives of the auxiliary tasks into a consistent formalization closer to constituency parsing. Finally, we jointly train the model to transfer task and domain knowledge to cross-domain constituency parsing. We verify the effectiveness of our proposed model on five target domains of MCTB. Experimental results show that our knowledge transfer model outperforms various baseline models, including conventional chart-based and transition-based parsers and the current large-scale language model for zero-shot and few-shot settings.

Cite

Text

Guo et al. "Cross-Domain Constituency Parsing by Leveraging Heterogeneous Data." Journal of Artificial Intelligence Research, 2024. doi:10.1613/JAIR.1.15736

Markdown

[Guo et al. "Cross-Domain Constituency Parsing by Leveraging Heterogeneous Data." Journal of Artificial Intelligence Research, 2024.](https://mlanthology.org/jair/2024/guo2024jair-crossdomain/) doi:10.1613/JAIR.1.15736

BibTeX

@article{guo2024jair-crossdomain,
  title     = {{Cross-Domain Constituency Parsing by Leveraging Heterogeneous Data}},
  author    = {Guo, Peiming and Zhang, Meishan and Chen, Yulong and Li, Jianling and Zhang, Min and Zhang, Yue},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2024},
  pages     = {771-791},
  doi       = {10.1613/JAIR.1.15736},
  volume    = {81},
  url       = {https://mlanthology.org/jair/2024/guo2024jair-crossdomain/}
}