ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
Abstract
Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking. In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total. Along with the bounding box annotations and frame-level attributes, we also annotate this dataset with 123.9K pixel-level target masks. Besides, the camera intrinsic and camera pose of each frame are provided for future developments. To demonstrate the potential usefulness of this dataset, we further present a unified baseline for both box-level and pixel-level tracking, which integrates RGB features with bird's-eye-view representations to better explore cross-modality 3D geometry. In-depth empirical analysis has verified that the ARKitTrack dataset can significantly facilitate RGB-D tracking and that the proposed baseline method compares favorably against the state of the arts. The source code and dataset will be released.
Cite
Text
Zhao et al. "ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00496Markdown
[Zhao et al. "ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhao2023cvpr-arkittrack/) doi:10.1109/CVPR52729.2023.00496BibTeX
@inproceedings{zhao2023cvpr-arkittrack,
title = {{ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data}},
author = {Zhao, Haojie and Chen, Junsong and Wang, Lijun and Lu, Huchuan},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {5126-5135},
doi = {10.1109/CVPR52729.2023.00496},
url = {https://mlanthology.org/cvpr/2023/zhao2023cvpr-arkittrack/}
}