Toward Computer Vision Systems That Understand Real-World Assembly Processes
Abstract
Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.
Cite
Text
Jones et al. "Toward Computer Vision Systems That Understand Real-World Assembly Processes." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00051Markdown
[Jones et al. "Toward Computer Vision Systems That Understand Real-World Assembly Processes." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/jones2019wacv-computer/) doi:10.1109/WACV.2019.00051BibTeX
@inproceedings{jones2019wacv-computer,
title = {{Toward Computer Vision Systems That Understand Real-World Assembly Processes}},
author = {Jones, Jonathan D. and Hager, Gregory D. and Khudanpur, Sanjeev},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2019},
pages = {426-434},
doi = {10.1109/WACV.2019.00051},
url = {https://mlanthology.org/wacv/2019/jones2019wacv-computer/}
}