Tracking Everything Everywhere Across Multiple Cameras
Abstract
Pixel tracking in single-view video sequences has recently emerged as a significant area of research. While previous work has primarily concentrated on tracking within a given video, we propose to expand pixel correspondence estimation into multi-view scenarios. The central concept involves utilizing a canonical space that preserves a universal 3D representation across different views and timesteps. This model allows for precise tracking of points even through prolonged occlusions and significant deformations in appearance between views. Moreover, we show that our model, through the use of an efficient training strategy incorporating distillation loss, is capable of performing incremental pixel tracking, a process often seen as complex in test-time optimization techniques. Comprehensive experiments validate the method's ability to accurately establish point correspondences across cameras. Furthermore, our method achieves promising results of multi-view pixel tracking without requiring the entire video sequences to be provided at once.
Cite
Text
Wang et al. "Tracking Everything Everywhere Across Multiple Cameras." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32839Markdown
[Wang et al. "Tracking Everything Everywhere Across Multiple Cameras." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-tracking/) doi:10.1609/AAAI.V39I7.32839BibTeX
@inproceedings{wang2025aaai-tracking,
title = {{Tracking Everything Everywhere Across Multiple Cameras}},
author = {Wang, Li-Heng and Cheng, YuJu and Liu, Tyng-Luh},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {7789-7797},
doi = {10.1609/AAAI.V39I7.32839},
url = {https://mlanthology.org/aaai/2025/wang2025aaai-tracking/}
}