Maximum Entropy Markov Models for Information Extraction and Segmentation

McCallum, Andrew; Freitag, Dayne; Pereira, Fernando C. N.

Maximum Entropy Markov Models for Information Extraction and Segmentation

Andrew McCallum, Dayne Freitag, Fernando C. N. Pereira

ICML 2000 pp. 591-598

/icml/2000/mccallum2000icml-maximum/

Abstract

Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many text-related tasks, such as part-of-speech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations. This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word, capitalization, formatting, part-of-speech), and defines the conditional probability of state sequences given observation sequences. It does this by using the maximum entropy framework to fit a set of exponential models that represent the probability of a state given an observation and the previous state. We present positive experimental results on the segmentation of FAQ&apos;s. 1. Introdu...

Semantic Scholar

Cite

Text

McCallum et al. "Maximum Entropy Markov Models for Information Extraction and Segmentation." International Conference on Machine Learning, 2000.

Markdown

[McCallum et al. "Maximum Entropy Markov Models for Information Extraction and Segmentation." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/mccallum2000icml-maximum/)

BibTeX

@inproceedings{mccallum2000icml-maximum,
  title     = {{Maximum Entropy Markov Models for Information Extraction and Segmentation}},
  author    = {McCallum, Andrew and Freitag, Dayne and Pereira, Fernando C. N.},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {591-598},
  url       = {https://mlanthology.org/icml/2000/mccallum2000icml-maximum/}
}