Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization

Abstract

An auditory "scene", composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analy- sis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.

Cite

Text

Sha and Saul. "Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization." Neural Information Processing Systems, 2004.

Markdown

[Sha and Saul. "Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/sha2004neurips-realtime/)

BibTeX

@inproceedings{sha2004neurips-realtime,
  title     = {{Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization}},
  author    = {Sha, Fei and Saul, Lawrence K.},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {1233-1240},
  url       = {https://mlanthology.org/neurips/2004/sha2004neurips-realtime/}
}