SpikingBrain: Spiking Brain-Inspired Large Models

Pan, Yuqi; Feng, Yupeng; Zhuang, JingHao; Ding, Siyu; Xu, Han; Liu, Zehao; Sun, Bohan; Chou, Yuhong; Qiu, Xuerui; Deng, Anlin; Hu, Anjie; Wang, Shurong; Zhou, Peng; Yao, Man; Wu, Jibin; Yang, Jian; 孙国梁,; Xu, Bo; Li, Guoqi

SpikingBrain: Spiking Brain-Inspired Large Models

Yuqi Pan, Yupeng Feng, JingHao Zhuang, Siyu Ding, Han Xu, Zehao Liu, Bohan Sun, Yuhong Chou, Xuerui Qiu, Anlin Deng, Anjie Hu, Shurong Wang, Peng Zhou, Man Yao, Jibin Wu, Jian Yang, 孙国梁, Bo Xu, Guoqi Li

TMLR 2026

/tmlr/2026/pan2026tmlr-spikingbrain/

Abstract

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100× speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

PDF TMLR OpenReview Code Semantic Scholar

Cite

Text

Pan et al. "SpikingBrain: Spiking Brain-Inspired Large Models." Transactions on Machine Learning Research, 2026.

Markdown

[Pan et al. "SpikingBrain: Spiking Brain-Inspired Large Models." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/pan2026tmlr-spikingbrain/)

BibTeX

@article{pan2026tmlr-spikingbrain,
  title     = {{SpikingBrain: Spiking Brain-Inspired Large Models}},
  author    = {Pan, Yuqi and Feng, Yupeng and Zhuang, JingHao and Ding, Siyu and Xu, Han and Liu, Zehao and Sun, Bohan and Chou, Yuhong and Qiu, Xuerui and Deng, Anlin and Hu, Anjie and Wang, Shurong and Zhou, Peng and Yao, Man and Wu, Jibin and Yang, Jian and 孙国梁,  and Xu, Bo and Li, Guoqi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/pan2026tmlr-spikingbrain/}
}