Diffusion-4k: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Abstract

In this paper, we present Diffusion-4K, a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models. The core advancements include: (1) Aesthetic-4K Benchmark: addressing the absence of a publicly available 4K image synthesis dataset, we construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation. We curated a high-quality 4K dataset with carefully selected images and captions generated by GPT-4o. Additionally, we introduce GLCM Score and compression ratio metrics to evaluate fine details, combined with holistic measures such as FID, Aesthetics and CLIPScore for a comprehensive assessment of ultra-high-resolution images. (2) Wavelet-based Fine-tuning: we propose a wavelet-based fine-tuning approach for direct training with photorealistic 4K images, applicable to various latent diffusion models, demonstrating its effectiveness in synthesizing highly detailed 4K images. Consequently, Diffusion-4K achieves impressive performance in high-quality image synthesis and text prompt adherence, especially when powered by modern large-scale diffusion models (e.g., SD3-2B and Flux-12B). Extensive experimental results from our benchmark demonstrate the superiority of Diffusion-4K in ultra-high-resolution image synthesis. Code is available at https://github.com/zhang0jhon/diffusion-4k.

Cite

Text

Zhang et al. "Diffusion-4k: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02185

Markdown

[Zhang et al. "Diffusion-4k: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/zhang2025cvpr-diffusion4k/) doi:10.1109/CVPR52734.2025.02185

BibTeX

@inproceedings{zhang2025cvpr-diffusion4k,
  title     = {{Diffusion-4k: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models}},
  author    = {Zhang, Jinjin and Huang, Qiuyu and Liu, Junjie and Guo, Xiefan and Huang, Di},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {23464-23473},
  doi       = {10.1109/CVPR52734.2025.02185},
  url       = {https://mlanthology.org/cvpr/2025/zhang2025cvpr-diffusion4k/}
}