HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image

Abstract

Diffusion-based garment synthesis tasks primarily focus on the design phase in the fashion domain, while the garment production process remains largely underexplored. To bridge this gap, we introduce a new task: Flat Sketch to Realistic Garment Image (FS2RG), which generates realistic garment images by integrating flat sketches and textual guidance. FS2RG presents two key challenges: 1) fabric characteristics are solely guided by textual prompts, providing insufficient visual supervision for diffusion-based models, which limits their ability to capture fine-grained fabric details; 2) flat sketches and textual guidance may provide conflicting information, requiring the model to selectively preserve or modify garment attributes while maintaining structural coherence. To tackle this task, we propose HiGarment, a novel framework that comprises two core components: i) a multi-modal semantic enhancement mechanism that enhances fabric representation across textual and visual modalities, and ii) a harmonized cross-attention mechanism that dynamically balances information from flat sketches and text prompts, allowing controllable synthesis by generating either sketch-aligned (image-biased) or text-guided (text-biased) outputs. Furthermore, we collect Multi-modal Detailed Garment, the largest open-source dataset for garment generation. Experimental results and user studies demonstrate the effectiveness of HiGarment in garment synthesis. The code and dataset are available at https://github.com/Maple498/HiGarment.

Cite

Text

Guo et al. "HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image." International Conference on Computer Vision, 2025.

Markdown

[Guo et al. "HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/guo2025iccv-higarment/)

BibTeX

@inproceedings{guo2025iccv-higarment,
  title     = {{HiGarment: Cross-Modal Harmony Based Diffusion Model for Flat Sketch to Realistic Garment Image}},
  author    = {Guo, Junyi and Zhang, Jingxuan and Wu, Fangyu and Lu, Huanda and Wang, Qiufeng and Yang, Wenmian and Lim, Eng Gee and Lu, Dongming},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {18542-18551},
  url       = {https://mlanthology.org/iccv/2025/guo2025iccv-higarment/}
}