A Language-Based Similarity Measure

Abstract

This paper presents an unified framework for the definition of similarity measures for various formalisms (attribute-value, first order logic...). The underlying idea is that the similarity between two objects does not depend only on the attribute values of the objects, but more especially on the set of the potentially relevant definitions of concepts for the problem considered. In our framework, the user defines a language with a grammar to specify the similarity measure. Each term of the language represents a property of the objects. The similarity between two objects is the probability that these two objects both satisfy or both reject simultaneously the properties of the given language. When this probability is not computable, we use a stochastic generation procedure to approximate it. This measure can be applied for both clustering and classification tasks. The empirical evaluation on common classification problems shows a very good accuracy.

Cite

Text

Martin and Moal. "A Language-Based Similarity Measure." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_29

Markdown

[Martin and Moal. "A Language-Based Similarity Measure." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/martin2001ecml-languagebased/) doi:10.1007/3-540-44795-4_29

BibTeX

@inproceedings{martin2001ecml-languagebased,
  title     = {{A Language-Based Similarity Measure}},
  author    = {Martin, Lionel and Moal, Frédéric},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {336-347},
  doi       = {10.1007/3-540-44795-4_29},
  url       = {https://mlanthology.org/ecmlpkdd/2001/martin2001ecml-languagebased/}
}