GAMR: A Guided Attention Model for (visual) Reasoning

Abstract

Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning ($\textit{GAMR}$), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.

Cite

Text

Vaishnav and Serre. "GAMR: A Guided Attention Model for (visual) Reasoning." International Conference on Learning Representations, 2023.

Markdown

[Vaishnav and Serre. "GAMR: A Guided Attention Model for (visual) Reasoning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/vaishnav2023iclr-gamr/)

BibTeX

@inproceedings{vaishnav2023iclr-gamr,
  title     = {{GAMR: A Guided Attention Model for (visual) Reasoning}},
  author    = {Vaishnav, Mohit and Serre, Thomas},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/vaishnav2023iclr-gamr/}
}