CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Abstract
Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.
Cite
Text
Wang et al. "CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning." Proceedings of the sixth Conference on Health, Inference, and Learning, 2025.Markdown
[Wang et al. "CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning." Proceedings of the sixth Conference on Health, Inference, and Learning, 2025.](https://mlanthology.org/chil/2025/wang2025chil-careaqa/)BibTeX
@inproceedings{wang2025chil-careaqa,
title = {{CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning}},
author = {Wang, Tsai-Ning and Chen, Lin-Lin and Zeghidour, Neil and Saeed, Aaqib},
booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning},
year = {2025},
pages = {231-246},
volume = {287},
url = {https://mlanthology.org/chil/2025/wang2025chil-careaqa/}
}