Two Applications of Statistical Modelling to Natural Language Processing

Abstract

Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.

Cite

Text

Du Mouchel et al. "Two Applications of Statistical Modelling to Natural Language Processing." Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, 1995.

Markdown

[Du Mouchel et al. "Two Applications of Statistical Modelling to Natural Language Processing." Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, 1995.](https://mlanthology.org/aistats/1995/mouchel1995aistats-two/)

BibTeX

@inproceedings{mouchel1995aistats-two,
  title     = {{Two Applications of Statistical Modelling to Natural Language Processing}},
  author    = {Du Mouchel, William and Friedman, Carol and Hripcsak, George and Johnson, Stephen B. and Clayton, Paul D.},
  booktitle = {Pre-proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics},
  year      = {1995},
  pages     = {192-198},
  volume    = {R0},
  url       = {https://mlanthology.org/aistats/1995/mouchel1995aistats-two/}
}