Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset

Berro, Auday; Benatallah, Boualem; Gaci, Yacine; Benabdeslem, Khalid

doi:10.1007/978-3-031-70341-6_20

Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset

Auday Berro, Boualem Benatallah, Yacine Gaci, Khalid Benabdeslem

ECML-PKDD 2024 pp. 332-349

doi:10.1007/978-3-031-70341-6_20 /ecmlpkdd/2024/berro2024ecmlpkdd-error/

Abstract

Developing task-oriented bots requires diverse sets of annotated user utterances to learn mappings between natural language utterances and user intents. Automated paraphrase generation offers a cost-effective and scalable approach for generating varied training samples by creating different versions of the same utterance. However, existing sequence-to-sequence models used in automated paraphrasing often suffer from errors, such as repetition and grammar. Identifying these errors, particularly in transformer architectures, has become a challenge. In this paper, we propose a taxonomy of errors encountered in transformer -based paraphrase generation models based on a comprehensive error analysis of transformer -generated paraphrases. Leveraging this taxonomy, we introduced the Transformer-based Paraphrasing Model Errors dataset, consisting of 5880 annotated paraphrases labeled with error types and explanations. Additionally, we developed a novel multilabel paraphrase annotation model by fine-tuning a BERT model for error annotation task. Evaluation against human annotations demonstrates significant agreement, with the model showing robust performance in predicting error labels, even for unseen paraphrases.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Berro et al. "Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024. doi:10.1007/978-3-031-70341-6_20

Markdown

[Berro et al. "Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2024.](https://mlanthology.org/ecmlpkdd/2024/berro2024ecmlpkdd-error/) doi:10.1007/978-3-031-70341-6_20

BibTeX

@inproceedings{berro2024ecmlpkdd-error,
  title     = {{Error Types in Transformer-Based Paraphrasing Models: A Taxonomy, Paraphrase Annotation Model and Dataset}},
  author    = {Berro, Auday and Benatallah, Boualem and Gaci, Yacine and Benabdeslem, Khalid},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2024},
  pages     = {332-349},
  doi       = {10.1007/978-3-031-70341-6_20},
  url       = {https://mlanthology.org/ecmlpkdd/2024/berro2024ecmlpkdd-error/}
}