Automatic Acquisition of Lexical Knowledge from Sparse and Noisy Data

Schneider, René

doi:10.1007/BFB0026670

Automatic Acquisition of Lexical Knowledge from Sparse and Noisy Data

René Schneider

ECML-PKDD 1998 pp. 43-48

doi:10.1007/BFB0026670 /ecmlpkdd/1998/schneider1998ecml-automatic/

Abstract

Optical character recognition (OCR) still garbles a considerable amount of information reduction and noise on texts so that many documents are unsuitable for information extraction systems. This paper introduces a statistical method for bootstrapping a lexicon from a very small number of “noisy ,” domain-specific texts. This method determines regularity in grammatical forms and also reoccuring ungrammatical forms from the input text. Through a combination of frequency lists and Levenshtein matrices, a language independent, robust core lexicon is constructed that supports the analysis of “noisy texts,” too.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Schneider. "Automatic Acquisition of Lexical Knowledge from Sparse and Noisy Data." European Conference on Machine Learning, 1998. doi:10.1007/BFB0026670

Markdown

[Schneider. "Automatic Acquisition of Lexical Knowledge from Sparse and Noisy Data." European Conference on Machine Learning, 1998.](https://mlanthology.org/ecmlpkdd/1998/schneider1998ecml-automatic/) doi:10.1007/BFB0026670

BibTeX

@inproceedings{schneider1998ecml-automatic,
  title     = {{Automatic Acquisition of Lexical Knowledge from Sparse and Noisy Data}},
  author    = {Schneider, René},
  booktitle = {European Conference on Machine Learning},
  year      = {1998},
  pages     = {43-48},
  doi       = {10.1007/BFB0026670},
  url       = {https://mlanthology.org/ecmlpkdd/1998/schneider1998ecml-automatic/}
}