GENIE: A Visual-Only Diffusion Framework for Task- Agnostic Image Transformation

Abstract

Designing a unified vision model capable of handling diverse visual transformation tasks without task-specific modifications remains a significant challenge, particularly in scaling and generalizing beyond narrowly defined objectives. We propose GENIE, a novel ControlNet-Diffusion framework that performs task-based image generation solely through visual exemplars, eliminating dependence on textual prompts or auxiliary metadata. Unlike conventional prompt-driven diffusion models, GENIE employs a dual visual conditioning mechanism—combining implicit guidance via ControlNet and explicit task encoding through CLIP-based visual arithmetic—to infer task intent directly from reference input-output pairs. To improve semantic alignment between visual exemplars and generated outputs, we introduce a lightweight task consistency loss, which encourages representational coherence in the embedding space across transformed pairs. While not a multitask learner in the classical sense, GENIE enables task switching across multiple tasks without any task-specific modifications in architecture or task-specific loss functions. Evaluations across seven vision tasks—inpainting, colorization, edge detection, deblurring, denoising, semantic segmentation, and depth estimation—and two out-of-distribution (OOD) tasks—deraining and contrast enhancement—demonstrate that GENIE achieves an average performance gain of 10% over visual-conditioned baselines, showcasing its effectiveness for scalable and text-free visual generation.

Cite

Text

Singh et al. "GENIE: A Visual-Only Diffusion Framework for Task- Agnostic Image Transformation." Transactions on Machine Learning Research, 2026.

Markdown

[Singh et al. "GENIE: A Visual-Only Diffusion Framework for Task- Agnostic Image Transformation." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/singh2026tmlr-genie/)

BibTeX

@article{singh2026tmlr-genie,
  title     = {{GENIE: A Visual-Only Diffusion Framework for Task- Agnostic Image Transformation}},
  author    = {Singh, Uddeshya and Thomas, Aniket and Agarwal, Aishwarya and Karanam, Srikrishna and Banerjee, Biplab},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/singh2026tmlr-genie/}
}