LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos

Abstract

Physical agility is a necessary skill in competitive table tennis, but by no means sufficient. Champions excel in this fast-paced and highly dynamic environment by anticipating their opponent's intent - buying themselves the necessary time to react. In this work, we take one step towards designing such an anticipatory agent. Previous works have developed systems capable of real-time table tennis gameplay, though they often do not leverage anticipation. Among the works that forecast opponent actions, their approaches are limited by dataset size and variety. Our paper contributes (1) a scalable system for reconstructing monocular video of table tennis matches in 3D and (2) an uncertainty-aware controller that anticipates opponent actions. We demonstrate in simulation that our policy improves the ball return rate against high-speed hits from 49.9% to 59.0% as compared to a baseline non-anticipatory policy.

Cite

Text

Etaat et al. "LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00667

Markdown

[Etaat et al. "LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/etaat2025cvpr-lattemv/) doi:10.1109/CVPR52734.2025.00667

BibTeX

@inproceedings{etaat2025cvpr-lattemv,
  title     = {{LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos}},
  author    = {Etaat, Daniel and Kalaria, Dvij and Rahmanian, Nima and Sastry, S. Shankar},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {7115-7124},
  doi       = {10.1109/CVPR52734.2025.00667},
  url       = {https://mlanthology.org/cvpr/2025/etaat2025cvpr-lattemv/}
}