Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System

Abstract

This paper brings two contributions in relation with the semantic heterogeneous (documents composed of texts and images) information retrieval: (1) A new context-based semantic distance measure for textual data, and (2) an IR system providing a conceptual and an automatic indexing of documents by considering their heterogeneous content using a domain specific ontology. The proposed semantic distance measure is used in order to automatically fuzzify our domain ontology. The two proposals are evaluated and very interesting results were obtained. Using our semantic distance measure, we obtained a correlation ratio of 0.89 with human judgments on a set of words pairs which led our measure to outperform all the other measures. Preliminary combination results obtained on a specialized corpus of web pages are also reported.

Cite

Text

El Sayed et al. "Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System." European Conference on Machine Learning, 2007. doi:10.1007/978-3-540-68416-9_18

Markdown

[El Sayed et al. "Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System." European Conference on Machine Learning, 2007.](https://mlanthology.org/ecmlpkdd/2007/sayed2007ecml-using/) doi:10.1007/978-3-540-68416-9_18

BibTeX

@inproceedings{sayed2007ecml-using,
  title     = {{Using Semantic Distance in a Content-Based Heterogeneous Information Retrieval System}},
  author    = {El Sayed, Ahmad and Hacid, Hakim and Zighed, Djamel A.},
  booktitle = {European Conference on Machine Learning},
  year      = {2007},
  pages     = {224-237},
  doi       = {10.1007/978-3-540-68416-9_18},
  url       = {https://mlanthology.org/ecmlpkdd/2007/sayed2007ecml-using/}
}