Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment
Abstract
Post-stroke language impairments affect speech and language production, leading to lexical, semantic, syntactic, and articulatory-prosodic deficits. These disruptions extend from impaired cognitive-motor planning to execution, manifesting as altered vocal fold dynamics that compromise speech fluency and intelligibility. The high-dimensional and multimodal nature of these impairments poses significant challenges to traditional assessment methods, necessitating automated solutions that can capture the heterogeneity of disfluencies. We present a multimodal framework that integrates foundation model embeddings with clinically-guided features for speech assessment. Leveraging SONIVA, our purpose-built database of approximately 600 post-stroke patients, we fine-tune Whisper to extract encoder embeddings that capture pathological speech characteristics. These representations are integrated with linguistic complexity metrics, physiological glottal parameters, and acoustic features through neural networks. Our model achieves 92.4% classification accuracy in stroke detection, outperforming feature-based methods, with SHAP analysis validating the modality-specific importance. We further demonstrate real-word clinical utility through severity prediction on Comprehensive Aphasia Test (CAT) scores, achieving an N-RMSE of 0.1299. This framework establishes a clinically relevant approach for integrating speech representations with domain-specific biomarkers to potentially support diagnosis, severity tracking, and precision rehabilitation strategies.
Cite
Text
Sanguedolce et al. "Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment." ICLR 2025 Workshops: FM-Wild, 2025.Markdown
[Sanguedolce et al. "Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/sanguedolce2025iclrw-latent/)BibTeX
@inproceedings{sanguedolce2025iclrw-latent,
title = {{Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment}},
author = {Sanguedolce, Giulia and Gruia, Dragos-Cristian and Naylor, Patrick and Geranmayeh, Fatemeh},
booktitle = {ICLR 2025 Workshops: FM-Wild},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/sanguedolce2025iclrw-latent/}
}