A Mid-Level Representation of Visual Structures for Video Compression
Abstract
A video coding system is presented that partitions the scene into "visual structures" and a residual "background" layer. A low-level representation ("track-template") of visual structures is proposed that exploits their temporal redundancy. A dictionary of track-templates is constructed that is used to encode video frames. We make optimal use of the dictionary in terms of rate-distortion by choosing a subset of the dictionary's elements for encoding using a Markov Random Field (MRF) formulation that places the track-templates in "depth" layers. The selected "track-templates" form the mid-level representation of the "visual structure" regions of the video. Our video coding system offers improvements over H.265/H.264 and other methods in a rate-distortion comparison.
Cite
Text
Georgiadis and Soatto. "A Mid-Level Representation of Visual Structures for Video Compression." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016. doi:10.1109/WACV.2016.7477703Markdown
[Georgiadis and Soatto. "A Mid-Level Representation of Visual Structures for Video Compression." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016.](https://mlanthology.org/wacv/2016/georgiadis2016wacv-mid/) doi:10.1109/WACV.2016.7477703BibTeX
@inproceedings{georgiadis2016wacv-mid,
title = {{A Mid-Level Representation of Visual Structures for Video Compression}},
author = {Georgiadis, Georgios and Soatto, Stefano},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2016},
pages = {1-8},
doi = {10.1109/WACV.2016.7477703},
url = {https://mlanthology.org/wacv/2016/georgiadis2016wacv-mid/}
}