Connectionist Architectures for Multi-Speaker Phoneme Recognition
Abstract
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modu(cid:173) lar designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - ognize the speech of one particular male speaker using internal models of other male speakers exclusively.
Cite
Text
Ii and Waibel. "Connectionist Architectures for Multi-Speaker Phoneme Recognition." Neural Information Processing Systems, 1989.Markdown
[Ii and Waibel. "Connectionist Architectures for Multi-Speaker Phoneme Recognition." Neural Information Processing Systems, 1989.](https://mlanthology.org/neurips/1989/ii1989neurips-connectionist/)BibTeX
@inproceedings{ii1989neurips-connectionist,
title = {{Connectionist Architectures for Multi-Speaker Phoneme Recognition}},
author = {Ii, John B. Hampshire and Waibel, Alex},
booktitle = {Neural Information Processing Systems},
year = {1989},
pages = {203-210},
url = {https://mlanthology.org/neurips/1989/ii1989neurips-connectionist/}
}