Named Entity Driven Zero-Shot Image Manipulation

Abstract

We introduced StyleEntity a zero-shot image manipulation model that utilizes named entities as proxies during its training phase. This strategy enables our model to manipulate images using unseen textual descriptions during inference all within a single training phase. Additionally we proposed an inference technique termed Prompt Ensemble Latent Averaging (PELA). PELA averages the manipulation directions derived from various named entities during inference effectively eliminating the noise directions thus achieving stable manipulation. In our experiments StyleEntity exhibited superior performance in a zero-shot setting compared to other methods. The code model weights and datasets is available at https://github.com/feng-zhida/StyleEntity.

Cite

Text

Feng et al. "Named Entity Driven Zero-Shot Image Manipulation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00870

Markdown

[Feng et al. "Named Entity Driven Zero-Shot Image Manipulation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/feng2024cvpr-named/) doi:10.1109/CVPR52733.2024.00870

BibTeX

@inproceedings{feng2024cvpr-named,
  title     = {{Named Entity Driven Zero-Shot Image Manipulation}},
  author    = {Feng, Zhida and Chen, Li and Tian, Jing and Liu, JiaXiang and Feng, Shikun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9110-9119},
  doi       = {10.1109/CVPR52733.2024.00870},
  url       = {https://mlanthology.org/cvpr/2024/feng2024cvpr-named/}
}