Face Detection with End-to-End Integration of a ConvNet and a 3D Model

Abstract

This paper presents a method for face detection in the wild, which integrates a ConvNet and a 3D mean face model in an end-to-end multi-task discriminative learning framework. The 3D mean face model is predefined and fixed (e.g., we used the one provided in the AFLW dataset). The ConvNet consists of two components: (i) The face proposal component computes face bounding box proposals via estimating facial key-points and the 3D transformation (rotation and translation) parameters for each predicted key-point w.r.t. the 3D mean face model. (ii) The face verification component computes detection results by pruning and refining proposals based on facial key-points based configuration pooling. The proposed method addresses two issues in adapting state-of-the-art generic object detection ConvNets (e.g., faster R-CNN) for face detection: (i) One is to eliminate the heuristic design of predefined anchor boxes in the region proposals network (RPN) by exploiting a 3D mean face model. (ii) The other is to replace the generic RoI (Region-of-Interest) pooling layer with a configuration pooling layer to respect underlying object structures. The multi-task loss consists of three terms: the classification Softmax loss and the location smooth \(l_1\)-losses of both the facial key-points and the face bounding boxes. In experiments, our ConvNet is trained on the AFLW dataset only and tested on the FDDB benchmark with fine-tuning and on the AFW benchmark without fine-tuning. The proposed method obtains very competitive state-of-the-art performance in the two benchmarks.

Cite

Text

Li et al. "Face Detection with End-to-End Integration of a ConvNet and a 3D Model." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46487-9_26

Markdown

[Li et al. "Face Detection with End-to-End Integration of a ConvNet and a 3D Model." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/li2016eccv-face/) doi:10.1007/978-3-319-46487-9_26

BibTeX

@inproceedings{li2016eccv-face,
  title     = {{Face Detection with End-to-End Integration of a ConvNet and a 3D Model}},
  author    = {Li, Yunzhu and Sun, Benyuan and Wu, Tianfu and Wang, Yizhou},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {420-436},
  doi       = {10.1007/978-3-319-46487-9_26},
  url       = {https://mlanthology.org/eccv/2016/li2016eccv-face/}
}