Emu Edit: Precise Image Editing via Recognition and Generation Tasks

Abstract

Instruction-based image editing holds immense potential for a variety of applications as it enables users to perform any editing operation using a natural language instruction. However current models in this domain often struggle with accurately executing user instructions. We present Emu Edit a multi-task image editing model which sets state-of-the-art results in instruction-based image editing. To develop Emu Edit we train it to multi-task across an unprecedented range of tasks such as region-based editing free-form editing and Computer Vision tasks all of which are formulated as generative tasks. Additionally to enhance Emu Edit's multi-task learning abilities we provide it with learned task embeddings which guide the generation process towards the correct edit type. Both these elements are essential for Emu Edit's outstanding performance. Furthermore we show that Emu Edit can generalize to new tasks such as image inpainting super-resolution and compositions of editing tasks with just a few labeled examples. This capability offers a significant advantage in scenarios where high-quality samples are scarce. Lastly to facilitate a more rigorous and informed assessment of instructable image editing models we release a new challenging and versatile benchmark that includes seven different image editing tasks.

Cite

Text

Sheynin et al. "Emu Edit: Precise Image Editing via Recognition and Generation Tasks." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00847

Markdown

[Sheynin et al. "Emu Edit: Precise Image Editing via Recognition and Generation Tasks." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/sheynin2024cvpr-emu/) doi:10.1109/CVPR52733.2024.00847

BibTeX

@inproceedings{sheynin2024cvpr-emu,
  title     = {{Emu Edit: Precise Image Editing via Recognition and Generation Tasks}},
  author    = {Sheynin, Shelly and Polyak, Adam and Singer, Uriel and Kirstain, Yuval and Zohar, Amit and Ashual, Oron and Parikh, Devi and Taigman, Yaniv},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8871-8879},
  doi       = {10.1109/CVPR52733.2024.00847},
  url       = {https://mlanthology.org/cvpr/2024/sheynin2024cvpr-emu/}
}