Spiking Wavelet Transformer
Abstract
Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer’s effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.
Cite
Text
Fang et al. "Spiking Wavelet Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73116-7_2Markdown
[Fang et al. "Spiking Wavelet Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/fang2024eccv-spiking/) doi:10.1007/978-3-031-73116-7_2BibTeX
@inproceedings{fang2024eccv-spiking,
title = {{Spiking Wavelet Transformer}},
author = {Fang, Yuetong and Wang, Ziqing and Zhang, Lingfeng and Cao, Jiahang and Chen, Honglei and Xu, Renjing},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73116-7_2},
url = {https://mlanthology.org/eccv/2024/fang2024eccv-spiking/}
}