Discriminative Topic Segmentation of Text and Speech

Abstract

We explore automated discovery of topically-coherent segments in speech or text sequences. We give two new discriminative topic segmentation algorithms which employ a new measure of text similarity based on word co-occurrence. Both algorithms function by finding extrema in the similarity signal over the text, with the latter algorithm using a compact support-vector based description of a window of text or speech observations in word similarity space to overcome noise introduced by speech recognition errors and off-topic content. In experiments over speech and text news streams, we show that these algorithms outperform previous methods. We observe that topic segmentation of speech recognizer output is a more difficult problem than that of text streams; however, we demonstrate that by using a lattice of competing hypotheses rather than just the one-best hypothesis as input to the segmentation algorithm, the performance of the algorithm can be improved.

Cite

Text

Mohri et al. "Discriminative Topic Segmentation of Text and Speech." Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.

Markdown

[Mohri et al. "Discriminative Topic Segmentation of Text and Speech." Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010.](https://mlanthology.org/aistats/2010/mohri2010aistats-discriminative/)

BibTeX

@inproceedings{mohri2010aistats-discriminative,
  title     = {{Discriminative Topic Segmentation of Text and Speech}},
  author    = {Mohri, Mehryar and Moreno, Pedro and Weinstein, Eugene},
  booktitle = {Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics},
  year      = {2010},
  pages     = {533-540},
  volume    = {9},
  url       = {https://mlanthology.org/aistats/2010/mohri2010aistats-discriminative/}
}