A Hierarchical Graphical Model for Record Linkage

Abstract

The task of matching co-referent records is known among other names as record linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonably clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the record-linkage problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotonicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.

Cite

Text

Ravikumar and Cohen. "A Hierarchical Graphical Model for Record Linkage." Conference on Uncertainty in Artificial Intelligence, 2004. doi:10.5555/1036843.1036898

Markdown

[Ravikumar and Cohen. "A Hierarchical Graphical Model for Record Linkage." Conference on Uncertainty in Artificial Intelligence, 2004.](https://mlanthology.org/uai/2004/ravikumar2004uai-hierarchical/) doi:10.5555/1036843.1036898

BibTeX

@inproceedings{ravikumar2004uai-hierarchical,
  title     = {{A Hierarchical Graphical Model for Record Linkage}},
  author    = {Ravikumar, Pradeep and Cohen, William W.},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2004},
  pages     = {454-461},
  doi       = {10.5555/1036843.1036898},
  url       = {https://mlanthology.org/uai/2004/ravikumar2004uai-hierarchical/}
}