Toward Computer Vision Systems That Understand Real-World Assembly Processes

Abstract

Many applications of computer vision require robust systems that can parse complex structures as they evolve in time. Using a block construction task as a case study, we illustrate the main components involved in building such systems. We evaluate performance at three increasingly-detailed levels of spatial granularity on two multimodal (RGBD + IMU) datasets. On the first, designed to match the assumptions of the model, we report better than 90% accuracy at the finest level of granularity. On the second, designed to test the robustness of our model under adverse, real-world conditions, we report 67% accuracy and 91% precision at the mid-level of granularity. We show that this seemingly simple process presents many opportunities to expand the frontiers of computer vision and action recognition.

Cite

Text

Jones et al. "Toward Computer Vision Systems That Understand Real-World Assembly Processes." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00051

Markdown

[Jones et al. "Toward Computer Vision Systems That Understand Real-World Assembly Processes." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/jones2019wacv-computer/) doi:10.1109/WACV.2019.00051

BibTeX

@inproceedings{jones2019wacv-computer,
  title     = {{Toward Computer Vision Systems That Understand Real-World Assembly Processes}},
  author    = {Jones, Jonathan D. and Hager, Gregory D. and Khudanpur, Sanjeev},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2019},
  pages     = {426-434},
  doi       = {10.1109/WACV.2019.00051},
  url       = {https://mlanthology.org/wacv/2019/jones2019wacv-computer/}
}