Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization
Abstract
This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise entropy. We analyze this in the perspective of networks attempting to bypass LayerNorm’s constraints, which conflict with IR tasks. Accordingly, we address two misalignments: 1) per-token normalization that disrupts spatial correlations, and 2) input-independent scaling that discards input-specific statistics. To address this, we propose Image Restoration Transformer Tailored Layer Normalization (i-LN), a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input. We provide theoretical insights and empirical evidence that this design effectively captures important spatial correlations and better preserves input-specific statistics throughout the network. Experimental results verify that the proposed i-LN consistently outperforms vanilla LN on various IR tasks.
Cite
Text
Lee et al. "Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization." International Conference on Learning Representations, 2026.Markdown
[Lee et al. "Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lee2026iclr-analyzing/)BibTeX
@inproceedings{lee2026iclr-analyzing,
title = {{Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization}},
author = {Lee, MinKyu and Hyun, Sangeek and Jun, Woojin and Kim, Hyunjun and Chung, Jiwoo and Heo, Jae-Pil},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/lee2026iclr-analyzing/}
}