Identifying Similar Words and Contexts in Natural Language with SenseClusters
Abstract
SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it can be used to identify syn-onyms and sets of related words. It has been applied to a di-verse range of problems, including proper name disambigua-tion, word sense discrimination, email organization, and doc-ument clustering. SenseClusters is a complete system that supports feature selection from large corpora, several differ-ent context representation schemes, various clustering algo-rithms, the creation of descriptive and discriminating labels for the discovered clusters, and evaluation relative to gold standard data.
Cite
Text
Pedersen and Kulkarni. "Identifying Similar Words and Contexts in Natural Language with SenseClusters." AAAI Conference on Artificial Intelligence, 2005.Markdown
[Pedersen and Kulkarni. "Identifying Similar Words and Contexts in Natural Language with SenseClusters." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/pedersen2005aaai-identifying/)BibTeX
@inproceedings{pedersen2005aaai-identifying,
title = {{Identifying Similar Words and Contexts in Natural Language with SenseClusters}},
author = {Pedersen, Ted and Kulkarni, Anagha},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2005},
pages = {1694-1695},
url = {https://mlanthology.org/aaai/2005/pedersen2005aaai-identifying/}
}