Reproducibility Study of "Improving Interpretation Faithfulness for Vision Transformers"
Abstract
This paper attempts to reproduce the findings of the study "Improving Interpretation Faith-fulness For Vision Transformers" Hu et al. (2024). The authors focus on making visual transformers (ViTs) more robust to adversarial attacks, and calling these robust ViTs faithful ViTs (FViTs). In their paper they propose a universal method to transform ViTs to FViTs called denoised diffusion smoothing (DDS). The reproduction of the authors study suffers from certain challenges, but the main claims still hold. Furthermore, this study extends the original paper by trying different diffusion models for DDS and tries to generalize the increased robustness of FViTs.
Cite
Text
Changlani et al. "Reproducibility Study of "Improving Interpretation Faithfulness for Vision Transformers"." Transactions on Machine Learning Research, 2025.Markdown
[Changlani et al. "Reproducibility Study of "Improving Interpretation Faithfulness for Vision Transformers"." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/changlani2025tmlr-reproducibility/)BibTeX
@article{changlani2025tmlr-reproducibility,
title = {{Reproducibility Study of "Improving Interpretation Faithfulness for Vision Transformers"}},
author = {Changlani, Meher and Hucko, Benjamin and Kechagias, Ioannis and Mahadevan, Aswin Krishna},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/changlani2025tmlr-reproducibility/}
}