Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously

Okuno, Hiroshi G.; Nakatani, Tomohiro; Kawabata, Takeshi

Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

AAAI 1996 pp. 1082-1089

/aaai/1996/okuno1996aaai-interfacing/

Abstract

This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting some sounds for non-harmonic parts of groups. This system is implemented by extending the harmonic-based stream segregation system reported at AAAI-94 and IJCAI-95. The main problem in interfacing SSS with HMM-based ASR is how to improve recognition performance which is degraded by spectral distortion of segregated sounds caused mainly by the binaural input, grouping, and residue substitution. Our solution is to re-train the parameters of the HMM with training data binauralized for four directions, to group harmonic fragments according to their directions, and to substitute the res...

PDF AAAI Semantic Scholar

Cite

Text

Okuno et al. "Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously." AAAI Conference on Artificial Intelligence, 1996.

Markdown

[Okuno et al. "Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/okuno1996aaai-interfacing/)

BibTeX

@inproceedings{okuno1996aaai-interfacing,
  title     = {{Interfacing Sound Stream Segregation to Automatic Speech Recognition - Preliminary Results on Listening to Several Sounds Simultaneously}},
  author    = {Okuno, Hiroshi G. and Nakatani, Tomohiro and Kawabata, Takeshi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1996},
  pages     = {1082-1089},
  url       = {https://mlanthology.org/aaai/1996/okuno1996aaai-interfacing/}
}