Open-Source Conversational AI with SpeechBrain 1.0

Abstract

SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete recipes of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.

Cite

Text

Ravanelli et al. "Open-Source Conversational AI with SpeechBrain 1.0." Machine Learning Open Source Software, 2024.

Markdown

[Ravanelli et al. "Open-Source Conversational AI with SpeechBrain 1.0." Machine Learning Open Source Software, 2024.](https://mlanthology.org/mloss/2024/ravanelli2024jmlr-opensource/)

BibTeX

@article{ravanelli2024jmlr-opensource,
  title     = {{Open-Source Conversational AI with SpeechBrain 1.0}},
  author    = {Ravanelli, Mirco and Parcollet, Titouan and Moumen, Adel and de Langen, Sylvain and Subakan, Cem and Plantinga, Peter and Wang, Yingzhi and Mousavi, Pooneh and Della Libera, Luca and Ploujnikov, Artem and Paissan, Francesco and Borra, Davide and Zaiem, Salah and Zhao, Zeyu and Zhang, Shucong and Karakasidis, Georgios and Yeh, Sung-Lin and Champion, Pierre and Rouhe, Aku and Braun, Rudolf and Mai, Florian and Zuluaga-Gomez, Juan and Mousavi, Seyed Mahed and Nautsch, Andreas and Nguyen, Ha and Liu, Xuechen and Sagar, Sangeet and Duret, Jarod and Mdhaffar, Salima and Laperrière, Gaëlle and Rouvier, Mickael and De Mori, Renato and Estève, Yannick},
  journal   = {Machine Learning Open Source Software},
  year      = {2024},
  pages     = {1-11},
  volume    = {25},
  url       = {https://mlanthology.org/mloss/2024/ravanelli2024jmlr-opensource/}
}