Compact and Efficient Multitask Learning in Vision, Language and Speech

Abstract

Across-domain multitask learning is a challenging area of computer vision and machine learning due to the intra-similarities among class distributions. Addressing this problem to cope with the human cognition system by considering inter and intra-class categorization and recognition complicates the problem even further. We propose in this work an effective holistic and hierarchical learning by using a text embedding layer on top of a deep learning model. We also propose a novel sensory discriminator approach to resolve the collisions between different tasks and domains. We then train the model concurrently on textual sentiment analysis, speech recognition, image classification, action recognition from video, and handwriting word spotting of two different scripts (Arabic and English). The model we propose successfully learned different tasks across multiple domains.

Cite

Text

Al-Rawi and Valveny. "Compact and Efficient Multitask Learning in Vision, Language and Speech." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00355

Markdown

[Al-Rawi and Valveny. "Compact and Efficient Multitask Learning in Vision, Language and Speech." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/alrawi2019iccvw-compact/) doi:10.1109/ICCVW.2019.00355

BibTeX

@inproceedings{alrawi2019iccvw-compact,
  title     = {{Compact and Efficient Multitask Learning in Vision, Language and Speech}},
  author    = {Al-Rawi, Mohammed and Valveny, Ernest},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {2933-2942},
  doi       = {10.1109/ICCVW.2019.00355},
  url       = {https://mlanthology.org/iccvw/2019/alrawi2019iccvw-compact/}
}