Visually Indicated Sound Generation by Perceptually Optimized Classification
Abstract
Visually indicated sound generation aims to predict visually consistent sound from the video content. Previous methods addressed this problem by creating a single generative model that ignores the distinctive characteristics of various sound categories. Nowadays, state-of-the-art sound classification networks are available to capture semantic-level information in audio modality, which can also serve for the purpose of visually indicated sound generation. In this paper, we explore generating fine-grained sound from a variety of sound classes, and leverage pre-trained sound classification networks to improve the audio generation quality. We propose a novel Perceptually Optimized Classification based Audio generation Network (POCAN), which generates sound conditioned on the sound class predicted from visual information. Additionally, a perceptual loss is calculated via a pre-trained sound classification network to align the semantic information between the generated sound and its ground truth during training. Experiments show that POCAN achieves significantly better results in visually indicated sound generation task on two datasets.
Cite
Text
Chen et al. "Visually Indicated Sound Generation by Perceptually Optimized Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11024-6_43Markdown
[Chen et al. "Visually Indicated Sound Generation by Perceptually Optimized Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/chen2018eccvw-visually/) doi:10.1007/978-3-030-11024-6_43BibTeX
@inproceedings{chen2018eccvw-visually,
title = {{Visually Indicated Sound Generation by Perceptually Optimized Classification}},
author = {Chen, Kan and Zhang, Chuanxi and Fang, Chen and Wang, Zhaowen and Bui, Trung and Nevatia, Ram},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {560-574},
doi = {10.1007/978-3-030-11024-6_43},
url = {https://mlanthology.org/eccvw/2018/chen2018eccvw-visually/}
}