Image and Video Quality Assessment Using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization

Abstract

The design of image and video quality assessment (QA) algorithms is extremely important to benchmark and calibrate user experience in modern visual systems. A major drawback of the state-of-the-art QA methods is their limited ability to generalize across diverse image and video datasets with reasonable distribution shifts. In this work, we leverage the denoising process of diffusion models for generalized image QA (IQA) and video QA (VQA) by understanding the degree of alignment between learnable quality-aware text prompts and images or video frames. In particular, we learn cross-attention maps from intermediate layers of the denoiser of latent diffusion models (LDMs) to capture quality-aware representations of images or video frames. Since applying text-to-image LDMs for every video frame is computationally expensive for videos, we only estimate the quality of a frame-rate subsampled version of the original video. To compensate for the loss in motion information due to frame-rate sub-sampling, we propose a novel temporal quality modulator. Our extensive cross-database experiments across various user-generated, synthetic, low-light, frame-rate variation, ultra high definition, and streaming content-based databases show that our model can achieve superior generalization in both IQA and VQA.

Cite

Text

Mitra et al. "Image and Video Quality Assessment Using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization." Transactions on Machine Learning Research, 2025.

Markdown

[Mitra et al. "Image and Video Quality Assessment Using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/mitra2025tmlr-image/)

BibTeX

@article{mitra2025tmlr-image,
  title     = {{Image and Video Quality Assessment Using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization}},
  author    = {Mitra, Shankhanil and De, Diptanu and Rao, Shika and Soundararajan, Rajiv},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/mitra2025tmlr-image/}
}