Understanding Self-Attention of Self-Supervised Audio Transformers

Abstract

Self-supervised Audio Transformers (SAT) enable great success in many downstream speech applications like ASR, but how they work has not been widely explored yet. In this work, we present multiple strategies for the analysis of attention mechanisms in SAT. We categorize attentions into explainable categories, where we discover each category possesses its own unique functionality. We provide a visualization tool for understanding multi-head self-attention, importance ranking strategies for identifying critical attention, and attention refinement techniques to improve model performance.

Cite

Text

Yang et al. "Understanding Self-Attention of Self-Supervised Audio Transformers." ICML 2020 Workshops: SAS, 2020.

Markdown

[Yang et al. "Understanding Self-Attention of Self-Supervised Audio Transformers." ICML 2020 Workshops: SAS, 2020.](https://mlanthology.org/icmlw/2020/yang2020icmlw-understanding/)

BibTeX

@inproceedings{yang2020icmlw-understanding,
  title     = {{Understanding Self-Attention of Self-Supervised Audio Transformers}},
  author    = {Yang, Shu-wen and Liu, Andy T. and Lee, Hung-yi},
  booktitle = {ICML 2020 Workshops: SAS},
  year      = {2020},
  url       = {https://mlanthology.org/icmlw/2020/yang2020icmlw-understanding/}
}