Image Manipulation via Neuro-Symbolic Networks

Abstract

Image manipulation via natural language text -- an extremely useful task for multiple AI applications but requires complex reasoning over multi-modal spaces. Neuro-symbolic approaches has been quite effective in solving such tasks as they offer better modularity, interpretability, and generalizability. A noteworthy such approach is NSCL [10] developed for the task of Visual Question Answering (VQA). We extend NSCL for the image manipulation task and propose a solution referred to as NEUROSIM. Unlike previous works, which either require supervised data training or can only deal with simple reasoning instructions over single object scenes; NEUROSIM can perform complex multi-hop reasoning over multi-object scenes and requires only weak supervision in the form of annotated data for the VQA task. On the language side, NEUROSIM contains neural modules that parse an instruction into a symbolic program over a Domain Specific Language (DSL) comprising manipulation operations that guide the manipulation. On the perceptual side, NEUROSIM contains neural modules which first generate a scene graph of the input image and then change the scene graph representation following the parsed instruction. To train these modules, we design novel loss functions capable of testing the correctness of manipulated object and scene graph representations via query networks. An image decoder is trained to render the final image from the manipulated scene graph representation. Extensive experiments demonstrate that NEUROSIM is highly competitive with state-of-the-art supervised baselines.

Cite

Text

Singh et al. "Image Manipulation via Neuro-Symbolic Networks." NeurIPS 2022 Workshops: nCSI, 2022.

Markdown

[Singh et al. "Image Manipulation via Neuro-Symbolic Networks." NeurIPS 2022 Workshops: nCSI, 2022.](https://mlanthology.org/neuripsw/2022/singh2022neuripsw-image/)

BibTeX

@inproceedings{singh2022neuripsw-image,
  title     = {{Image Manipulation via Neuro-Symbolic Networks}},
  author    = {Singh, Harman and Garg, Poorva and Gupta, Mohit and Shah, Kevin and Mondal, Arnab Kumar and Khandelwal, Dinesh and Singla, Parag and Garg, Dinesh},
  booktitle = {NeurIPS 2022 Workshops: nCSI},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/singh2022neuripsw-image/}
}