Learning to Count Visual Objects by Combining "what" and "where" in Recurrent Memory
Abstract
Counting the number of objects in a visual scene is easy for humans but challenging for modern deep neural networks. Here we explore what makes this problem hard and study the neural computations that allow transfer of counting ability to new objects and contexts. Previous work has implicated posterior parietal cortex (PPC) in numerosity perception and in visual scene understanding more broadly. It has been proposed that action-related saccadic signals computed in PPC provide object-invariant information about the number and arrangement of scene elements, and may contribute to relational reasoning in visual displays. Here, we built a glimpsing recurrent neural network that combines gaze contents ("what") and gaze location ("where") to count the number of items in a visual array. The network successfully learns to count and generalizes to several out-of-distribution test sets, including images with novel items. Through ablations and comparison to control models, we establish the contribution of brain-inspired computational principles to this generalization ability. This work provides a proof-of-principle demonstration that a neural network that combines "what" and "where" can learn a generalizable concept of numerosity and points to a promising approach for other visual reasoning tasks.
Cite
Text
Thompson et al. "Learning to Count Visual Objects by Combining "what" and "where" in Recurrent Memory." NeurIPS 2022 Workshops: GMML, 2022.Markdown
[Thompson et al. "Learning to Count Visual Objects by Combining "what" and "where" in Recurrent Memory." NeurIPS 2022 Workshops: GMML, 2022.](https://mlanthology.org/neuripsw/2022/thompson2022neuripsw-learning/)BibTeX
@inproceedings{thompson2022neuripsw-learning,
title = {{Learning to Count Visual Objects by Combining "what" and "where" in Recurrent Memory}},
author = {Thompson, Jessica A F and Sheahan, Hannah and Summerfield, Christopher},
booktitle = {NeurIPS 2022 Workshops: GMML},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/thompson2022neuripsw-learning/}
}