AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild
Abstract
Exemplar-based semantic image synthesis generates images aligned with semantic content while preserving the appearance of an exemplar. Conventional structure-guidance models like ControlNet, are limited as they rely solely on text prompts to control appearance and cannot utilize exemplar images as input. Recent tuning-free approaches address this by transferring local appearance via implicit cross-image matching in the augmented self-attention mechanism of pre-trained diffusion models. However, prior works are often restricted to single-object cases or foreground object appearance transfer, struggling with complex scenes involving multiple objects. To overcome this, we propose AM-Adapter (Appearance Matching Adapter) to address exemplar-based semantic image synthesis in-the-wild, enabling multi-object appearance transfer from a single scene-level image. AM-Adapter automatically transfers local appearances from the scene-level input. AM-Adapter alternatively provides controllability to map user-defined object details to specific locations in the synthesized images. Our learnable framework enhances cross-image matching within augmented self-attention by integrating semantic information from segmentation maps. To disentangle generation and matching, we adopt stage-wise training. We first train the structure-guidance and generation networks, followed by training the matching adapter while keeping the others frozen. During inference, we introduce an automated exemplar retrieval method for selecting exemplar image-segmentation pairs efficiently. Despite utilizing minimal learnable parameters, AM-Adapter achieves state-of-the-art performance, excelling in both semantic alignment and local appearance fidelity. Extensive ablations validate our design choices. Code and weights will be released.
Cite
Text
Jin et al. "AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild." International Conference on Computer Vision, 2025.Markdown
[Jin et al. "AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/jin2025iccv-amadapter/)BibTeX
@inproceedings{jin2025iccv-amadapter,
title = {{AM-Adapter: Appearance Matching Adapter for Exemplar-Based Semantic Image Synthesis In-the-Wild}},
author = {Jin, Siyoon and Nam, Jisu and Kim, Jiyoung and Chung, Dahyun and Kim, Yeong-Seok and Park, Joonhyung and Chu, Heonjeong and Kim, Seungryong},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {17077-17086},
url = {https://mlanthology.org/iccv/2025/jin2025iccv-amadapter/}
}