FacePhi: Lightweight Multimodal Large Language Model for Facial Landmark Emotion Recognition
Abstract
We introduce FacePhi, a multimodal large language model (LLM) for emotion recognition through facial landmarks. By focusing on facial landmarks, FacePhi ensures privacy preservation in emotion detection tasks. FacePhi is optimized for computational efficiency by incorporating Phi-2, a LLM with a small number of parameters, as well as utilizing lightweight facial landmark data. This design choice makes FacePhi suitable for deployment in resource-constrained settings. Our investigation highlights the importance of feature alignment during the training phase, indicating its pivotal role in enhancing the model's performance for the challenging task of facial landmark emotion recognition.
Cite
Text
Zhao et al. "FacePhi: Lightweight Multimodal Large Language Model for Facial Landmark Emotion Recognition." ICLR 2024 Workshops: PML4LRS, 2024.Markdown
[Zhao et al. "FacePhi: Lightweight Multimodal Large Language Model for Facial Landmark Emotion Recognition." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/zhao2024iclrw-facephi/)BibTeX
@inproceedings{zhao2024iclrw-facephi,
title = {{FacePhi: Lightweight Multimodal Large Language Model for Facial Landmark Emotion Recognition}},
author = {Zhao, Hongjin and Liu, Zheyuan and Liu, Yang and Qin, Zhenyue and Liu, Jiaxu and Gedeon, Tom},
booktitle = {ICLR 2024 Workshops: PML4LRS},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/zhao2024iclrw-facephi/}
}