Spiking Wavelet Transformer

Fang, Yuetong; Wang, Ziqing; Zhang, Lingfeng; Cao, Jiahang; Chen, Honglei; Xu, Renjing

doi:10.1007/978-3-031-73116-7_2

Spiking Wavelet Transformer

Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu

ECCV 2024

doi:10.1007/978-3-031-73116-7_2 /eccv/2024/fang2024eccv-spiking/

Abstract

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer’s effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.

PDF ECCV Semantic Scholar

Cite

Text

Fang et al. "Spiking Wavelet Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73116-7_2

Markdown

[Fang et al. "Spiking Wavelet Transformer." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/fang2024eccv-spiking/) doi:10.1007/978-3-031-73116-7_2

BibTeX

@inproceedings{fang2024eccv-spiking,
  title     = {{Spiking Wavelet Transformer}},
  author    = {Fang, Yuetong and Wang, Ziqing and Zhang, Lingfeng and Cao, Jiahang and Chen, Honglei and Xu, Renjing},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73116-7_2},
  url       = {https://mlanthology.org/eccv/2024/fang2024eccv-spiking/}
}