Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected

Abstract

This study aims to enlarge our current knowledge of the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in dynamic sparse training (DST). CHT leverages a gradient-free, topology-driven link regrowth mechanism, which has been shown to achieve ultra-sparse (1\% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: high time complexity of the link predictor and easy stack into the epitopological local minima. Here, we propose a matrix multiplication GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to $\mathcal{O}(N^3)$, enabling a fast implementation of CHT in large-scale models. Moreover, we introduce the **C**annistraci-**H**ebb **T**raining **s**oft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. To further improve performance, we integrate CHTs with a **s**igmoid gradual density decay strategy, referred to as CHTss. Empirical results show that 1) using 5\% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks; 2) using 30\% of the connections, CHTss achieves superior performance compared to other dynamic sparse training methods in language modeling (LLaMA-130M) across different sparsity levels, and it surpasses the fully connected counterpart in zero-shot evaluations.

Cite

Text

Zhang et al. "Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Zhang et al. "Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/zhang2025iclrw-braininspired/)

BibTeX

@inproceedings{zhang2025iclrw-braininspired,
  title     = {{Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected}},
  author    = {Zhang, Yingtao and Zhao, Jialin and Wu, Wenjing and Liao, Ziheng and Michieli, Umberto and Cannistraci, Carlo Vittorio},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/zhang2025iclrw-braininspired/}
}