A Cross-Species Neural Foundation Model for End-to-End Speech Decoding

Abstract

Speech brain–computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phonemes before assembling sentences with an n-gram language model (LM), preventing joint optimization of all stages simultaneously. Here, we introduce an end-to-end BraIn-to-Text (BIT) framework that translates neural activity into coherent sentences using a single differentiable neural network. Central to our approach is a cross-task, cross-species pretrained neural encoder, whose representations transfer to both attempted and imagined speech. In a cascaded setting with an n-gram LM, the pretrained encoder establishes a new state-of-the-art (SOTA) on the Brain-to-Text ’24 and ’25 benchmarks. Integrated end-to-end with audio large language models (LLMs) and trained with contrastive learning for cross-modal alignment, BIT reduces the word error rate (WER) of the prior end-to-end method from 24.69% to 10.22%. Notably, we find that small-scale audio-LLMs markedly improve end-to-end decoding. Beyond record-setting performance, BIT aligns attempted and imagined speech embeddings to enable cross-task generalization. Altogether, our approach advances the integration of large, diverse neural datasets, paving the way for an end-to-end decoding framework that supports seamless, differentiable optimization.

Cite

Text

Zhang et al. "A Cross-Species Neural Foundation Model for End-to-End Speech Decoding." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "A Cross-Species Neural Foundation Model for End-to-End Speech Decoding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-crossspecies/)

BibTeX

@inproceedings{zhang2026iclr-crossspecies,
  title     = {{A Cross-Species Neural Foundation Model for End-to-End Speech Decoding}},
  author    = {Zhang, Yizi and He, Linyang and Fan, Chaofei and Liu, Tingkai and Yu, Han and Le, Trung and Li, Jingyuan and Linderman, Scott and Duncker, Lea and Willett, Francis R and Mesgarani, Nima and Paninski, Liam},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-crossspecies/}
}