MFAS: Multimodal Fusion Architecture Search

Abstract

We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the \ntu dataset, the largest multimodal action recognition dataset available.

Cite

Text

Perez-Rua et al. "MFAS: Multimodal Fusion Architecture Search." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00713

Markdown

[Perez-Rua et al. "MFAS: Multimodal Fusion Architecture Search." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/perezrua2019cvpr-mfas/) doi:10.1109/CVPR.2019.00713

BibTeX

@inproceedings{perezrua2019cvpr-mfas,
  title     = {{MFAS: Multimodal Fusion Architecture Search}},
  author    = {Perez-Rua, Juan-Manuel and Vielzeuf, Valentin and Pateux, Stephane and Baccouche, Moez and Jurie, Frederic},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.00713},
  url       = {https://mlanthology.org/cvpr/2019/perezrua2019cvpr-mfas/}
}