A Few Good Sentences: Content Selection for Abstractive Text Summarization

Srivastava, Vivek; Bhat, Savita; Pedanekar, Niranjan

doi:10.1007/978-3-031-43421-1_8

A Few Good Sentences: Content Selection for Abstractive Text Summarization

Vivek Srivastava, Savita Bhat, Niranjan Pedanekar

ECML-PKDD 2023 pp. 124-141

doi:10.1007/978-3-031-43421-1_8 /ecmlpkdd/2023/srivastava2023ecmlpkdd-few/

Abstract

Abstractive text summarization has been of research interest for decades. Neural approaches, specifically recent transformer-based methods, have demonstrated promising performance in generating summaries with novel words and paraphrases. In spite of generating more fluent summaries, these approaches may yet show poor summary-worthy content selection. In these methods, the extractive content selection is majorly dependent on the reference summary with little to no focus on identifying the summary-worthy segments ( SWORTS ) in a reference-free setting. In this work, we leverage three metrics, namely, informativeness , relevance , and redundancy in selecting the SWORTS . We propose a novel topic-informed and reference-free method to rank the sentences in the source document based on their importance. We demonstrate the effectiveness of SWORTS selection in different settings such as fine-tuning, few-shot tuning, and zero-shot abstractive text summarization. We observe that self-training and cross-training a pre-trained model with SWORTS selected data shows competitive performance to the pre-trained model. Furthermore, a small amount of SWORTS selected data is sufficient for domain adaptation against fine-tuning on the entire training dataset with no content selection. In contrast to training a model on the source dataset with no content selection, we observe a significant reduction in the time required to train a model with SWORTS that further underlines the importance of content selection for training an abstractive text summarization model.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Srivastava et al. "A Few Good Sentences: Content Selection for Abstractive Text Summarization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_8

Markdown

[Srivastava et al. "A Few Good Sentences: Content Selection for Abstractive Text Summarization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/srivastava2023ecmlpkdd-few/) doi:10.1007/978-3-031-43421-1_8

BibTeX

@inproceedings{srivastava2023ecmlpkdd-few,
  title     = {{A Few Good Sentences: Content Selection for Abstractive Text Summarization}},
  author    = {Srivastava, Vivek and Bhat, Savita and Pedanekar, Niranjan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {124-141},
  doi       = {10.1007/978-3-031-43421-1_8},
  url       = {https://mlanthology.org/ecmlpkdd/2023/srivastava2023ecmlpkdd-few/}
}