Constraint-Based Entity Matching
Abstract
Entity matching is the problem of deciding if two given men-tions in the data, such as Helen Hunt and H. M. Hunt, refer to the same real-world entity. Numerous solutions have been developed, but they have not considered in depth the problem of exploiting integrity constraints that frequently ex-ist in the domains. Examples of such constraints include a mention with age two cannot match a mention with salary 200K and if two paper citations match, then their authors are likely to match in the same order. In this paper we de-scribe a probabilistic solution to entity matching that exploits such constraints to improve matching accuracy. At the heart of the solution is a generative model that takes into account the constraints during the generation process, and provides well-dened interpretations of the constraints. We describe a novel combination of EM and relaxation labeling algorithms that efciently learns the model, thereby matching mentions in an unsupervised way, without the need for annotated train-ing data. Experiments on several real-world domains show that our solution can exploit constraints to signicantly im-prove matching accuracy, by 3-12 % F-1, and that the solution scales up to large data sets.
Cite
Text
Shen et al. "Constraint-Based Entity Matching." AAAI Conference on Artificial Intelligence, 2005.Markdown
[Shen et al. "Constraint-Based Entity Matching." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/shen2005aaai-constraint/)BibTeX
@inproceedings{shen2005aaai-constraint,
title = {{Constraint-Based Entity Matching}},
author = {Shen, Warren and Li, Xin and Doan, AnHai},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2005},
pages = {862-867},
url = {https://mlanthology.org/aaai/2005/shen2005aaai-constraint/}
}