Enhancing Summarization with Text Classification via Topic Consistency

Abstract

The recent success of abstractive summarization is partly due to the availability of large-volume and high-quality human-produced summaries for training, which are extremely expensive to obtain. In this paper, we aim to improve state-of-the-art summarization models by utilizing less expensive text classification data. Specifically, we use an eXtreme Multi-label Text Classification (XMTC) classifier to predict relevant category labels for each input document, and impose topic consistency in the system-produced summary or in the document encoder shared by both the classifier and the summarization model. In other words, we use the classifier to distill the training of the summarization model with respect to topical consistency between the input document and the system-generated summary. Technically, we propose two novel formulations for this objective, namely a multi-task approach, and a policy gradient approach. Our experiments show that both approaches significantly improve a state-of-the-art BART summarization model on the CNNDM and XSum datasets. In addition, we propose a new evaluation metric, CON, that measures the topic consistency between the input document and the summary. We show that CON has high correlation with human judgements and is a good complementary metric to the commonly used ROUGE scores.

Cite

Text

Liu and Yang. "Enhancing Summarization with Text Classification via Topic Consistency." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86523-8_40

Markdown

[Liu and Yang. "Enhancing Summarization with Text Classification via Topic Consistency." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/liu2021ecmlpkdd-enhancing/) doi:10.1007/978-3-030-86523-8_40

BibTeX

@inproceedings{liu2021ecmlpkdd-enhancing,
  title     = {{Enhancing Summarization with Text Classification via Topic Consistency}},
  author    = {Liu, Jingzhou and Yang, Yiming},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {661-676},
  doi       = {10.1007/978-3-030-86523-8_40},
  url       = {https://mlanthology.org/ecmlpkdd/2021/liu2021ecmlpkdd-enhancing/}
}