Unsupervised Phrasal Near-Synonym Generation from Text Corpora

Abstract

Unsupervised discovery of synonymous phrases is useful in a variety of tasks ranging from text mining and search engines to semantic analysis and machine translation. This paper presents an unsupervised corpus-based conditional model: Near-Synonym System (NeSS) for finding phrasal synonyms and near synonyms that requires only a large monolingual corpus. The method is based on maximizing information-theoretic combinations of shared contexts and is parallelizable for large-scale processing. An evaluation framework with crowd-sourced judgments is proposed and results are compared with alternate methods, demonstrating considerably superior results to the literature and to thesaurus look up for multi-word phrases. Moreover, the results show that the statistical scoring functions and overall scalability of the system are more important than language specific NLP tools. The method is language-independent and practically useable due to accuracy and real-time performance via parallel decomposition.

Cite

Text

Gupta et al. "Unsupervised Phrasal Near-Synonym Generation from Text Corpora." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9504

Markdown

[Gupta et al. "Unsupervised Phrasal Near-Synonym Generation from Text Corpora." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/gupta2015aaai-unsupervised/) doi:10.1609/AAAI.V29I1.9504

BibTeX

@inproceedings{gupta2015aaai-unsupervised,
  title     = {{Unsupervised Phrasal Near-Synonym Generation from Text Corpora}},
  author    = {Gupta, Dishan and Carbonell, Jaime G. and Gershman, Anatole and Klein, Steve and Miller, David},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {2253-2259},
  doi       = {10.1609/AAAI.V29I1.9504},
  url       = {https://mlanthology.org/aaai/2015/gupta2015aaai-unsupervised/}
}