Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders

Abstract

Mental disorders, such as anxiety and depression, have become a global concern that affects people of all ages. Early detection and treatment are crucial to mitigate the negative effects these disorders can have on daily life. Although AI-based detection methods show promise, progress is hindered by the lack of publicly available large-scale datasets. To address this, we introduce the Multi-Modal Psychological assessment corpus (MMPsy), a large-scale dataset containing audio recordings and transcripts from Mandarin-speaking adolescents undergoing automated anxiety/depression assessment interviews. MMPsy also includes self-reported anxiety/depression evaluations using standardized psychological questionnaires. Leveraging this dataset, we propose Mental-Perceiver, a deep learning model for estimating mental disorders from audio and textual data. Extensive experiments on MMPsy and the DAIC-WOZ dataset demonstrate the effectiveness of Mental-Perceiver in anxiety and depression detection.

Cite

Text

Qin et al. "Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34687

Markdown

[Qin et al. "Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/qin2025aaai-mental/) doi:10.1609/AAAI.V39I23.34687

BibTeX

@inproceedings{qin2025aaai-mental,
  title     = {{Mental-Perceiver: Audio-Textual Multi-Modal Learning for Estimating Mental Disorders}},
  author    = {Qin, Jinghui and Liu, Changsong and Tang, Tianchi and Liu, Dahuang and Wang, Minghao and Huang, Qianying and Zhang, Rumin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25029-25037},
  doi       = {10.1609/AAAI.V39I23.34687},
  url       = {https://mlanthology.org/aaai/2025/qin2025aaai-mental/}
}