HQ-Edit: A High-Quality Dataset for Instruction-Based Image Editing

Abstract

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data.

Cite

Text

Hui et al. "HQ-Edit: A High-Quality Dataset for Instruction-Based Image Editing." International Conference on Learning Representations, 2025.

Markdown

[Hui et al. "HQ-Edit: A High-Quality Dataset for Instruction-Based Image Editing." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/hui2025iclr-hqedit/)

BibTeX

@inproceedings{hui2025iclr-hqedit,
  title     = {{HQ-Edit: A High-Quality Dataset for Instruction-Based Image Editing}},
  author    = {Hui, Mude and Yang, Siwei and Zhao, Bingchen and Shi, Yichun and Wang, Heng and Wang, Peng and Xie, Cihang and Zhou, Yuyin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/hui2025iclr-hqedit/}
}