Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected
Abstract
This study aims to enlarge our current knowledge of the application of brain-inspired network science principles for training artificial neural networks (ANNs) with sparse connectivity. The Cannistraci-Hebb training (CHT) is a brain-inspired method for growing connectivity in dynamic sparse training (DST). CHT leverages a gradient-free, topology-driven link regrowth mechanism, which has been shown to achieve ultra-sparse (1\% connectivity or lower) advantage across various tasks compared to fully connected networks. Yet, CHT suffers two main drawbacks: high time complexity of the link predictor and easy stack into the epitopological local minima. Here, we propose a matrix multiplication GPU-friendly approximation of the CH link predictor, which reduces the computational complexity to $\mathcal{O}(N^3)$, enabling a fast implementation of CHT in large-scale models. Moreover, we introduce the **C**annistraci-**H**ebb **T**raining **s**oft rule (CHTs), which adopts a flexible strategy for sampling connections in both link removal and regrowth, balancing the exploration and exploitation of network topology. To further improve performance, we integrate CHTs with a **s**igmoid gradual density decay strategy, referred to as CHTss. Empirical results show that 1) using 5\% of the connections, CHTss outperforms fully connected networks in two Transformer-based machine translation tasks; 2) using 30\% of the connections, CHTss achieves superior performance compared to other dynamic sparse training methods in language modeling (LLaMA-130M) across different sparsity levels, and it surpasses the fully connected counterpart in zero-shot evaluations.
Cite
Text
Zhang et al. "Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected." ICLR 2025 Workshops: SLLM, 2025.Markdown
[Zhang et al. "Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/zhang2025iclrw-braininspired/)BibTeX
@inproceedings{zhang2025iclrw-braininspired,
title = {{Brain-Inspired Sparse Training Enables Transformers and LLMs to Perform as Fully Connected}},
author = {Zhang, Yingtao and Zhao, Jialin and Wu, Wenjing and Liao, Ziheng and Michieli, Umberto and Cannistraci, Carlo Vittorio},
booktitle = {ICLR 2025 Workshops: SLLM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/zhang2025iclrw-braininspired/}
}