Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

Abstract

This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise entropy. We analyze this in the perspective of networks attempting to bypass LayerNorm’s constraints, which conflict with IR tasks. Accordingly, we address two misalignments: 1) per-token normalization that disrupts spatial correlations, and 2) input-independent scaling that discards input-specific statistics. To address this, we propose Image Restoration Transformer Tailored Layer Normalization (i-LN), a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input. We provide theoretical insights and empirical evidence that this design effectively captures important spatial correlations and better preserves input-specific statistics throughout the network. Experimental results verify that the proposed i-LN consistently outperforms vanilla LN on various IR tasks.

Cite

Text

Lee et al. "Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization." International Conference on Learning Representations, 2026.

Markdown

[Lee et al. "Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lee2026iclr-analyzing/)

BibTeX

@inproceedings{lee2026iclr-analyzing,
  title     = {{Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization}},
  author    = {Lee, MinKyu and Hyun, Sangeek and Jun, Woojin and Kim, Hyunjun and Chung, Jiwoo and Heo, Jae-Pil},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lee2026iclr-analyzing/}
}