End-to-End Signal Factorization for Speech: Identity, Content, and Style

Abstract

Preliminary experiments in this dissertation show that it is possible to factorize specific types of information from the speech signal in an abstract embedding space using machine learning. This information includes characteristics of the recording environment, speaking style, and speech quality. Based on these findings, a new technique is proposed to factorize multiple types of information from the speech signal simultaneously using a combination of state-of-the-art machine learning methods for speech processing. Successful speech signal factorization will lead to advances across many speech technologies, including improved speaker identification, detection of speech audio deep fakes, and controllable expression in speech synthesis.

Cite

Text

Williams. "End-to-End Signal Factorization for Speech: Identity, Content, and Style." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/746

Markdown

[Williams. "End-to-End Signal Factorization for Speech: Identity, Content, and Style." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/williams2020ijcai-end/) doi:10.24963/IJCAI.2020/746

BibTeX

@inproceedings{williams2020ijcai-end,
  title     = {{End-to-End Signal Factorization for Speech: Identity, Content, and Style}},
  author    = {Williams, Jennifer},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {5212-5213},
  doi       = {10.24963/IJCAI.2020/746},
  url       = {https://mlanthology.org/ijcai/2020/williams2020ijcai-end/}
}