Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
Abstract
An auditory "scene", composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analy- sis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.
Cite
Text
Sha and Saul. "Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization." Neural Information Processing Systems, 2004.Markdown
[Sha and Saul. "Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/sha2004neurips-realtime/)BibTeX
@inproceedings{sha2004neurips-realtime,
title = {{Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization}},
author = {Sha, Fei and Saul, Lawrence K.},
booktitle = {Neural Information Processing Systems},
year = {2004},
pages = {1233-1240},
url = {https://mlanthology.org/neurips/2004/sha2004neurips-realtime/}
}