Telepresence Video Quality Assessment
Abstract
Efficient and accurate video quality tools are needed to monitor and perceptually optimize telepresence traffic streamed via Zoom, Webex, Meet, etc. However, existing models are limited in their prediction capabilities on multi-modal, live streaming telepresence content. Here we address the significant challenges of Telepresence Video Quality Assessment (TVQA) in several ways. First, we mitigated the dearth of subjectively labeled data by collecting ~2k telepresence videos from different countries, on which we crowdsourced ~80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new TVQA database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.
Cite
Text
Ying et al. "Telepresence Video Quality Assessment." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19836-6Markdown
[Ying et al. "Telepresence Video Quality Assessment." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/ying2022eccv-telepresence/) doi:10.1007/978-3-031-19836-6BibTeX
@inproceedings{ying2022eccv-telepresence,
title = {{Telepresence Video Quality Assessment}},
author = {Ying, Zhenqiang and Ghadiyaram, Deepti and Bovik, Alan},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19836-6},
url = {https://mlanthology.org/eccv/2022/ying2022eccv-telepresence/}
}