CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

Abstract

Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves $86.2%$ accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of $56.9%$ on unseen datasets. These findings highlight the transformative potential of integrating audio analysis with language-based reasoning to address key challenges in medical diagnostics, opening new possibilities for scalable, data-efficient AI systems capable of supporting real-world clinical decision-making.

Cite

Text

Wang et al. "CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning." Proceedings of the sixth Conference on Health, Inference, and Learning, 2025.

Markdown

[Wang et al. "CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning." Proceedings of the sixth Conference on Health, Inference, and Learning, 2025.](https://mlanthology.org/chil/2025/wang2025chil-careaqa/)

BibTeX

@inproceedings{wang2025chil-careaqa,
  title     = {{CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning}},
  author    = {Wang, Tsai-Ning and Chen, Lin-Lin and Zeghidour, Neil and Saeed, Aaqib},
  booktitle = {Proceedings of the sixth Conference on Health, Inference, and Learning},
  year      = {2025},
  pages     = {231-246},
  volume    = {287},
  url       = {https://mlanthology.org/chil/2025/wang2025chil-careaqa/}
}