A Few Good Sentences: Content Selection for Abstractive Text Summarization
Abstract
Abstractive text summarization has been of research interest for decades. Neural approaches, specifically recent transformer-based methods, have demonstrated promising performance in generating summaries with novel words and paraphrases. In spite of generating more fluent summaries, these approaches may yet show poor summary-worthy content selection. In these methods, the extractive content selection is majorly dependent on the reference summary with little to no focus on identifying the summary-worthy segments ( SWORTS ) in a reference-free setting. In this work, we leverage three metrics, namely, informativeness , relevance , and redundancy in selecting the SWORTS . We propose a novel topic-informed and reference-free method to rank the sentences in the source document based on their importance. We demonstrate the effectiveness of SWORTS selection in different settings such as fine-tuning, few-shot tuning, and zero-shot abstractive text summarization. We observe that self-training and cross-training a pre-trained model with SWORTS selected data shows competitive performance to the pre-trained model. Furthermore, a small amount of SWORTS selected data is sufficient for domain adaptation against fine-tuning on the entire training dataset with no content selection. In contrast to training a model on the source dataset with no content selection, we observe a significant reduction in the time required to train a model with SWORTS that further underlines the importance of content selection for training an abstractive text summarization model.
Cite
Text
Srivastava et al. "A Few Good Sentences: Content Selection for Abstractive Text Summarization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_8Markdown
[Srivastava et al. "A Few Good Sentences: Content Selection for Abstractive Text Summarization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/srivastava2023ecmlpkdd-few/) doi:10.1007/978-3-031-43421-1_8BibTeX
@inproceedings{srivastava2023ecmlpkdd-few,
title = {{A Few Good Sentences: Content Selection for Abstractive Text Summarization}},
author = {Srivastava, Vivek and Bhat, Savita and Pedanekar, Niranjan},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2023},
pages = {124-141},
doi = {10.1007/978-3-031-43421-1_8},
url = {https://mlanthology.org/ecmlpkdd/2023/srivastava2023ecmlpkdd-few/}
}