Benchmarking Progress to Infant-Level Physical Reasoning in AI
Abstract
To what extent do modern AI systems comprehend the physical world? We introduce the open-access Infant-Level Physical Reasoning Benchmark (InfLevel) to gain insight into this question. We evaluate ten neural-network architectures developed for video understanding on tasks designed to test these models' ability to reason about three essential physical principles which researchers have shown to guide human infants' physical understanding. We explore the sensitivity of each AI system to the continuity of objects as they travel through space and time, to the solidity of objects, and to gravity. We find strikingly consistent results across 60 experiments with multiple systems, training regimes, and evaluation metrics: current popular visual-understanding systems are at or near chance on all three principles of physical reasoning. We close by suggesting some potential ways forward.
Cite
Text
Weihs et al. "Benchmarking Progress to Infant-Level Physical Reasoning in AI." Transactions on Machine Learning Research, 2022.Markdown
[Weihs et al. "Benchmarking Progress to Infant-Level Physical Reasoning in AI." Transactions on Machine Learning Research, 2022.](https://mlanthology.org/tmlr/2022/weihs2022tmlr-benchmarking/)BibTeX
@article{weihs2022tmlr-benchmarking,
title = {{Benchmarking Progress to Infant-Level Physical Reasoning in AI}},
author = {Weihs, Luca and Yuile, Amanda and Baillargeon, Renée and Fisher, Cynthia and Marcus, Gary and Mottaghi, Roozbeh and Kembhavi, Aniruddha},
journal = {Transactions on Machine Learning Research},
year = {2022},
url = {https://mlanthology.org/tmlr/2022/weihs2022tmlr-benchmarking/}
}