Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks

Abstract

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Cite

Text

Graves et al. "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143891

Markdown

[Graves et al. "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/graves2006icml-connectionist/) doi:10.1145/1143844.1143891

BibTeX

@inproceedings{graves2006icml-connectionist,
  title     = {{Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks}},
  author    = {Graves, Alex and Fernández, Santiago and Gomez, Faustino J. and Schmidhuber, Jürgen},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {369-376},
  doi       = {10.1145/1143844.1143891},
  url       = {https://mlanthology.org/icml/2006/graves2006icml-connectionist/}
}