GAMR: A Guided Attention Model for (visual) Reasoning
Abstract
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes. Here, we present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning ($\textit{GAMR}$), which instantiates an active vision theory -- positing that the brain solves complex visual reasoning problems dynamically -- via sequences of attention shifts to select and route task-relevant visual information into memory. Experiments on an array of visual reasoning tasks and datasets demonstrate GAMR's ability to learn visual routines in a robust and sample-efficient manner. In addition, GAMR is shown to be capable of zero-shot generalization on completely novel reasoning tasks. Overall, our work provides computational support for cognitive theories that postulate the need for a critical interplay between attention and memory to dynamically maintain and manipulate task-relevant visual information to solve complex visual reasoning tasks.
Cite
Text
Vaishnav and Serre. "GAMR: A Guided Attention Model for (visual) Reasoning." International Conference on Learning Representations, 2023.Markdown
[Vaishnav and Serre. "GAMR: A Guided Attention Model for (visual) Reasoning." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/vaishnav2023iclr-gamr/)BibTeX
@inproceedings{vaishnav2023iclr-gamr,
title = {{GAMR: A Guided Attention Model for (visual) Reasoning}},
author = {Vaishnav, Mohit and Serre, Thomas},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/vaishnav2023iclr-gamr/}
}