Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Abstract

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The al- gorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high res- olution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applica- tions: a voice-to-MIDI player that synthesizes electronic music from vo- calized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user’s pitch scrolling across the screen as he or she sings into the computer.

Cite

Text

Saul et al. "Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch." Neural Information Processing Systems, 2002.

Markdown

[Saul et al. "Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/saul2002neurips-real/)

BibTeX

@inproceedings{saul2002neurips-real,
  title     = {{Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch}},
  author    = {Saul, Lawrence K. and Lee, Daniel D. and Isbell, Charles L. and Cun, Yann L.},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1205-1212},
  url       = {https://mlanthology.org/neurips/2002/saul2002neurips-real/}
}