AddMe: Zero-Shot Group-Photo Synthesis by Inserting People into Scenes

Abstract

While large text-to-image diffusion models have made significant progress in high-quality image generation, challenges persist when users insert their portraits into existing photos, especially group photos. Concretely, existing customization methods struggle to insert facial identities at desired locations in existing images, and it is difficult for existing local image editing methods to deal with facial details. To address these limitations, we propose AddMe, a powerful diffusion-based portrait generator that can insert a given portrait into a desired location in an existing scene image in a zero-shot manner. Specifically, we propose a novel identity adapter to learn a facial representation decoupled from existing characters in the scene. Meanwhile, to ensure that the generated portrait can interact properly with others in the existing scene, we design an enhanced portrait attention module to capture contextual information during the generation process. Our method is compatible with both text and various spatial conditions, enabling precise control over the generated portraits. Extensive experiments demonstrate significant improvements in both performance and efficiency.

Cite

Text

Yue et al. "AddMe: Zero-Shot Group-Photo Synthesis by Inserting People into Scenes." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72661-3_13

Markdown

[Yue et al. "AddMe: Zero-Shot Group-Photo Synthesis by Inserting People into Scenes." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yue2024eccv-addme/) doi:10.1007/978-3-031-72661-3_13

BibTeX

@inproceedings{yue2024eccv-addme,
  title     = {{AddMe: Zero-Shot Group-Photo Synthesis by Inserting People into Scenes}},
  author    = {Yue, Dongxu and Li, Maomao and Liu, Yunfei and Zeng, Ailing and Yang, Tianyu and Guo, Qin and Li, Yu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72661-3_13},
  url       = {https://mlanthology.org/eccv/2024/yue2024eccv-addme/}
}