ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation

Abstract

Dense visual prediction tasks, such as detection and segmentation, are crucial for time-critical applications (e.g., autonomous driving and video surveillance). While deep models achieve strong performance, their efficiency remains a challenge. Knowledge distillation (KD) is an effective model compression technique, but existing feature-based KD methods rely on static, teacher-driven feature selection, failing to adapt to the student's evolving learning state or leverage dynamic student-teacher interactions. To address these limitations, we propose Adaptive student-teacher Cooperative Attention Masking for Knowledge Distillation (ACAM-KD), which introduces two key components: (1) Student-Teacher Cross-Attention Feature Fusion (STCA-FF), which adaptively integrates features from both models for a more interactive distillation process, and (2) Adaptive Spatial-Channel Masking (ASCM), which dynamically generates importance masks to enhance both spatial and channel-wise feature selection. Unlike conventional KD methods, ACAM-KD adapts to the student's evolving needs throughout the entire distillation process. Extensive experiments on multiple benchmarks validate its effectiveness. For instance, on COCO2017, ACAM-KD improves object detection performance by up to 1.4 mAP over the state-of-the-art when distilling a ResNet-50 student from a ResNet-101 teacher. For semantic segmentation on Cityscapes, it boosts mIoU by 3.09 over the baseline with DeepLabV3-MobileNetV2 as the student model.

Cite

Text

Lan and Tian. "ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation." International Conference on Computer Vision, 2025.

Markdown

[Lan and Tian. "ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/lan2025iccv-acamkd/)

BibTeX

@inproceedings{lan2025iccv-acamkd,
  title     = {{ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation}},
  author    = {Lan, Qizhen and Tian, Qing},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {3957-3966},
  url       = {https://mlanthology.org/iccv/2025/lan2025iccv-acamkd/}
}