Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis

Abstract

We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris' distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness.

Cite

Text

Cimiano et al. "Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis." Journal of Artificial Intelligence Research, 2005. doi:10.1613/JAIR.1648

Markdown

[Cimiano et al. "Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis." Journal of Artificial Intelligence Research, 2005.](https://mlanthology.org/jair/2005/cimiano2005jair-learning/) doi:10.1613/JAIR.1648

BibTeX

@article{cimiano2005jair-learning,
  title     = {{Learning Concept Hierarchies from Text Corpora Using Formal Concept Analysis}},
  author    = {Cimiano, Philipp and Hotho, Andreas and Staab, Steffen},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2005},
  pages     = {305-339},
  doi       = {10.1613/JAIR.1648},
  volume    = {24},
  url       = {https://mlanthology.org/jair/2005/cimiano2005jair-learning/}
}