From SoftMax to Score: Transformers Can Effectively Implement In-Context Denoising Steps
Abstract
Transformers have emerged as powerful meta-learners, with growing evidence that they implement learning algorithms within their forward pass. We study this phenomenon in the context of denoising, presenting a unified framework that shows Transformers can implement (a) manifold denoising via Laplacian flows, (b) score-based denoising from diffusion models, and (c) a generalized form of anisotropic diffusion denoising. Our theory establishes exact equivalence between Transformer attention updates and these algorithms. Empirically, we validate these findings on image denoising tasks, showing that even simple Transformers can perform robust denoising both with and without context. These results illustrate the Transformer’s flexibility as a denoising meta-learner. Code available at https://github.com/paulrosu11/Transformers_are_Diffusion_Denoisers.
Cite
Text
Rosu et al. "From SoftMax to Score: Transformers Can Effectively Implement In-Context Denoising Steps." Advances in Neural Information Processing Systems, 2025.Markdown
[Rosu et al. "From SoftMax to Score: Transformers Can Effectively Implement In-Context Denoising Steps." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/rosu2025neurips-softmax/)BibTeX
@inproceedings{rosu2025neurips-softmax,
title = {{From SoftMax to Score: Transformers Can Effectively Implement In-Context Denoising Steps}},
author = {Rosu, Paul and Carin, Lawrence and Cheng, Xiang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/rosu2025neurips-softmax/}
}