ESP-PCT: Enhanced VR Semantic Performance Through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Mei, Luoyu; Wang, Shuai; Cheng, Yun; Liu, Ruofeng; Yin, Zhimeng; Jiang, Wenchao; Wang, Shuai; Gong, Wei

doi:10.24963/ijcai.2024/131

ESP-PCT: Enhanced VR Semantic Performance Through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Shuai Wang, Wei Gong

IJCAI 2024 pp. 1182-1190

doi:10.24963/ijcai.2024/131 /ijcai/2024/mei2024ijcai-esp/

Abstract

Video-to-Audio (V2A) Generation achieves significant progress and plays a crucial role in film and video post-production. However, current methods overlook the cinematic language, a critical component of artistic expression in filmmaking. As a result, their performance deteriorates in scenarios where Foley targets are only partially visible. To address this challenge, we propose a simple self-distillation approach to extend V2A models to cinematic language scenarios. By simulating the cinematic language variations, the student model learns to align the video features of training pairs with the same audio-visual correspondences, enabling it to effectively capture the associations between sounds and partial visual information. Our method not only achieves impressive improvements under partial visibility across all evaluation metrics, but also enhances performance on the large-scale V2A dataset, VGGSound.

PDF IJCAI Semantic Scholar

Cite

Text

Mei et al. "ESP-PCT: Enhanced VR Semantic Performance Through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/131

Markdown

[Mei et al. "ESP-PCT: Enhanced VR Semantic Performance Through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/mei2024ijcai-esp/) doi:10.24963/ijcai.2024/131

BibTeX

@inproceedings{mei2024ijcai-esp,
  title     = {{ESP-PCT: Enhanced VR Semantic Performance Through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers}},
  author    = {Mei, Luoyu and Wang, Shuai and Cheng, Yun and Liu, Ruofeng and Yin, Zhimeng and Jiang, Wenchao and Wang, Shuai and Gong, Wei},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1182-1190},
  doi       = {10.24963/ijcai.2024/131},
  url       = {https://mlanthology.org/ijcai/2024/mei2024ijcai-esp/}
}