Replay: Multi-Modal Multi-View Acted Videos for Casual Holography

Abstract

We introduce Replay, a collection of multi-view, multi-modal videos of humans interacting socially. Each scene is filmed in high production quality, from different viewpoints with several static cameras, as well as wearable action cameras, and recorded with a large array of microphones at different positions in the room. Overall, the dataset contains over 3000 minutes of footage and over 5 million timestamped high-resolution frames annotated with camera poses and partially with foreground masks. The Replay dataset has many potential applications, such as novel-view synthesis, 3D reconstruction, novel-view acoustic synthesis, human body and face analysis, and training generative models. We provide a benchmark for training and evaluating novel-view synthesis, with two scenarios of different difficulty. Finally, we evaluate several baseline state-of-the-art methods on the new benchmark.

Cite

Text

Shapovalov et al. "Replay: Multi-Modal Multi-View Acted Videos for Casual Holography." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01860

Markdown

[Shapovalov et al. "Replay: Multi-Modal Multi-View Acted Videos for Casual Holography." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/shapovalov2023iccv-replay/) doi:10.1109/ICCV51070.2023.01860

BibTeX

@inproceedings{shapovalov2023iccv-replay,
  title     = {{Replay: Multi-Modal Multi-View Acted Videos for Casual Holography}},
  author    = {Shapovalov, Roman and Kleiman, Yanir and Rocco, Ignacio and Novotny, David and Vedaldi, Andrea and Chen, Changan and Kokkinos, Filippos and Graham, Ben and Neverova, Natalia},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {20338-20348},
  doi       = {10.1109/ICCV51070.2023.01860},
  url       = {https://mlanthology.org/iccv/2023/shapovalov2023iccv-replay/}
}