Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment
Abstract
Neural networks for Image Aesthetic Assessment are usually initialized with weights of pretrained ImageNet models and then trained using a labeled image aesthetics dataset. We argue that the ImageNet classification task is not well-suited for pretraining, since content based classification is designed to make the model invariant to features that strongly influence the image’s aesthetics, e.g. stylebased features such as brightness or contrast.We propose to use self-supervised aesthetic-aware pretext tasks that let the network learn aesthetically relevant features, based on the observation that distorting aesthetic images with image filters usually reduces their appeal. To ensure that images are not accidentally improved when filters are applied, we introduce a large dataset comprised of highly aesthetic images as the starting point for the distortions. The network is then trained to rank less distorted images higher than their more distorted counterparts. To exploit effects of multiple different objectives, we also embed this task into a multi-task setting by adding either a self-supervised classification or regression task. In our experiments, we show that our pretraining improves performance over the ImageNet initialization and reduces the number of epochs until convergence by up to 47%. Additionally, we can match the performance of an ImageNet-initialized model while reducing the labeled training data by 20%. We make our code, data, and pretrained models available.
Cite
Text
Pfister et al. "Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00091Markdown
[Pfister et al. "Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/pfister2021cvprw-selfsupervised/) doi:10.1109/CVPRW53098.2021.00091BibTeX
@inproceedings{pfister2021cvprw-selfsupervised,
title = {{Self-Supervised Multi-Task Pretraining Improves Image Aesthetic Assessment}},
author = {Pfister, Jan and Kobs, Konstantin and Hotho, Andreas},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2021},
pages = {816-825},
doi = {10.1109/CVPRW53098.2021.00091},
url = {https://mlanthology.org/cvprw/2021/pfister2021cvprw-selfsupervised/}
}