Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters
Abstract
One of the attractions of neural network approaches to pattern recognition is the use of a discrimination-based training method. We show that once we have modified the output layer of a multi(cid:173) layer perceptron to provide mathematically correct probability dis(cid:173) tributions, and replaced the usual squared error criterion with a probability-based score, the result is equivalent to Maximum Mu(cid:173) tual Information training, which has been used successfully to im(cid:173) prove the performance of hidden Markov models for speech recog(cid:173) nition. If the network is specially constructed to perform the recog(cid:173) nition computations of a given kind of stochastic model based clas(cid:173) sifier then we obtain a method for discrimination-based training of the parameters of the models. Examples include an HMM-based word discriminator, which we call an 'Alphanet'.
Cite
Text
Bridle. "Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters." Neural Information Processing Systems, 1989.Markdown
[Bridle. "Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters." Neural Information Processing Systems, 1989.](https://mlanthology.org/neurips/1989/bridle1989neurips-training/)BibTeX
@inproceedings{bridle1989neurips-training,
title = {{Training Stochastic Model Recognition Algorithms as Networks Can Lead to Maximum Mutual Information Estimation of Parameters}},
author = {Bridle, John S.},
booktitle = {Neural Information Processing Systems},
year = {1989},
pages = {211-217},
url = {https://mlanthology.org/neurips/1989/bridle1989neurips-training/}
}