Unsupervised Deep Keyphrase Generation

Abstract

Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.

Cite

Text

Shen et al. "Unsupervised Deep Keyphrase Generation." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I10.21381

Markdown

[Shen et al. "Unsupervised Deep Keyphrase Generation." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/shen2022aaai-unsupervised/) doi:10.1609/AAAI.V36I10.21381

BibTeX

@inproceedings{shen2022aaai-unsupervised,
  title     = {{Unsupervised Deep Keyphrase Generation}},
  author    = {Shen, Xianjie and Wang, Yinghan and Meng, Rui and Shang, Jingbo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {11303-11311},
  doi       = {10.1609/AAAI.V36I10.21381},
  url       = {https://mlanthology.org/aaai/2022/shen2022aaai-unsupervised/}
}