Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition
Abstract
Dynamic Facial Expression Recognition (DFER) is a rapidly developing field that focuses on recognizing facial expressions in video format. Previous research has considered non-target frames as noisy frames, but we propose that it should be treated as a weakly supervised problem. We also identify the imbalance of short- and long-term temporal relationships in DFER. Therefore, we introduce the Multi-3D Dynamic Facial Expression Learning (M3DFEL) framework, which utilizes Multi-Instance Learning (MIL) to handle inexact labels. M3DFEL generates 3D-instances to model the strong short-term temporal relationship and utilizes 3DCNNs for feature extraction. The Dynamic Long-term Instance Aggregation Module (DLIAM) is then utilized to learn the long-term temporal relationships and dynamically aggregate the instances. Our experiments on DFEW and FERV39K datasets show that M3DFEL outperforms existing state-of-the-art approaches with a vanilla R3D18 backbone. The source code is available at https://github.com/faceeyes/M3DFEL.
Cite
Text
Wang et al. "Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01722Markdown
[Wang et al. "Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/wang2023cvpr-rethinking/) doi:10.1109/CVPR52729.2023.01722BibTeX
@inproceedings{wang2023cvpr-rethinking,
title = {{Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition}},
author = {Wang, Hanyang and Li, Bo and Wu, Shuang and Shen, Siyuan and Liu, Feng and Ding, Shouhong and Zhou, Aimin},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {17958-17968},
doi = {10.1109/CVPR52729.2023.01722},
url = {https://mlanthology.org/cvpr/2023/wang2023cvpr-rethinking/}
}