SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization

Abstract

We propose a novel diffusion-based framework for automatic colorization of Anime-style facial sketches, which preserves the structural fidelity of the input sketch while effectively transferring stylistic attributes from a reference image. Our approach builds upon recent continuous-time diffusion models, but departs from traditional methods that rely on predefined noise schedules, which often fail to maintain perceptual consistency across the generative trajectory. To address this, we introduce SSIMBaD (Sigma Scaling with SSIM-Guided Balanced Diffusion), a sigma-space transformation that ensures linear alignment of perceptual degradation, as measured by structural similarity. This perceptual scaling enforces uniform visual difficulty across timesteps, enabling more balanced and faithful reconstructions. On a large-scale Anime face dataset, SSIMBaD attains state-of-the-art structural fidelity and strong perceptual quality, with robust generalization to diverse styles and structural variations.

Cite

Text

Seo et al. "SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization." Advances in Neural Information Processing Systems, 2025.

Markdown

[Seo et al. "SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/seo2025neurips-ssimbad/)

BibTeX

@inproceedings{seo2025neurips-ssimbad,
  title     = {{SSIMBaD: Sigma Scaling with SSIM-Guided Balanced Diffusion for AnimeFace Colorization}},
  author    = {Seo, Junpyo and HanbinKoo,  and Yook, Jieun and Moon, Byung-Ro},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/seo2025neurips-ssimbad/}
}