ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation
Abstract
Dense visual prediction tasks, such as detection and segmentation, are crucial for time-critical applications (e.g., autonomous driving and video surveillance). While deep models achieve strong performance, their efficiency remains a challenge. Knowledge distillation (KD) is an effective model compression technique, but existing feature-based KD methods rely on static, teacher-driven feature selection, failing to adapt to the student's evolving learning state or leverage dynamic student-teacher interactions. To address these limitations, we propose Adaptive student-teacher Cooperative Attention Masking for Knowledge Distillation (ACAM-KD), which introduces two key components: (1) Student-Teacher Cross-Attention Feature Fusion (STCA-FF), which adaptively integrates features from both models for a more interactive distillation process, and (2) Adaptive Spatial-Channel Masking (ASCM), which dynamically generates importance masks to enhance both spatial and channel-wise feature selection. Unlike conventional KD methods, ACAM-KD adapts to the student's evolving needs throughout the entire distillation process. Extensive experiments on multiple benchmarks validate its effectiveness. For instance, on COCO2017, ACAM-KD improves object detection performance by up to 1.4 mAP over the state-of-the-art when distilling a ResNet-50 student from a ResNet-101 teacher. For semantic segmentation on Cityscapes, it boosts mIoU by 3.09 over the baseline with DeepLabV3-MobileNetV2 as the student model.
Cite
Text
Lan and Tian. "ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation." International Conference on Computer Vision, 2025.Markdown
[Lan and Tian. "ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/lan2025iccv-acamkd/)BibTeX
@inproceedings{lan2025iccv-acamkd,
title = {{ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation}},
author = {Lan, Qizhen and Tian, Qing},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {3957-3966},
url = {https://mlanthology.org/iccv/2025/lan2025iccv-acamkd/}
}