Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition

Abstract

From video, we reconstruct a neural volume that captures time-varying color, density, scene flow, semantics, and attention information. The semantics and attention let us identify salient foreground objects separately from the background across spacetime. To mitigate low resolution semantic and attention features, we compute pyramids that trade detail with whole-image context. After optimization, we perform a saliency-aware clustering to decompose the scene. To evaluate real-world scenes, we annotate object masks in the NVIDIA Dynamic Scene and DyCheck datasets. We demonstrate that this method can decompose dynamic scenes in an unsupervised way with competitive performance to a supervised method, and that it improves foreground/background segmentation over recent static/dynamic split methods. Project webpage: https://visual.cs.brown.edu/saff

Cite

Text

Liang et al. "Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01992

Markdown

[Liang et al. "Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/liang2023iccv-semantic/) doi:10.1109/ICCV51070.2023.01992

BibTeX

@inproceedings{liang2023iccv-semantic,
  title     = {{Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition}},
  author    = {Liang, Yiqing and Laidlaw, Eliot and Meyerowitz, Alexander and Sridhar, Srinath and Tompkin, James},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {21797-21806},
  doi       = {10.1109/ICCV51070.2023.01992},
  url       = {https://mlanthology.org/iccv/2023/liang2023iccv-semantic/}
}