Multistrategy Learning for Information Extraction
Abstract
Information extraction (IE) is the problem of filling out pre-defined structured summaries from text documents. We are interested in performing IE in non-traditional domains, where much of the text is often ungrammatical, such as electronic bulletin board posts and Web pages. We suggest that the best approach is one that takes into account many different kinds of information, and argue for the suitability of a multistrategy approach. We describe learners for IE drawn from three separate machine learning paradigms: rote memorization, term-space text classification, and relational rule induction. By building regression models mapping from learner confidence to probability of correctness and combining probabilities appropriately, it is possible to improve extraction accuracy over that achieved by any individual learner. We describe three different multistrategy approaches. Experiments on two IE domains, a collection of electronic seminar announcements from a university computer science de...
Cite
Text
Freitag. "Multistrategy Learning for Information Extraction." International Conference on Machine Learning, 1998.Markdown
[Freitag. "Multistrategy Learning for Information Extraction." International Conference on Machine Learning, 1998.](https://mlanthology.org/icml/1998/freitag1998icml-multistrategy/)BibTeX
@inproceedings{freitag1998icml-multistrategy,
title = {{Multistrategy Learning for Information Extraction}},
author = {Freitag, Dayne},
booktitle = {International Conference on Machine Learning},
year = {1998},
pages = {161-169},
url = {https://mlanthology.org/icml/1998/freitag1998icml-multistrategy/}
}