Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis

Abstract

Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present **_Resp-Agent_**, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A²CA). Unlike static pipelines, Thinker-A²CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a modality-weaving Diagnoser that weaves clinical text with audio tokens via strategic global attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. To address the data gap, we design a flow matching Generator that adapts a text-only Large Language Model (LLM) via modality injection, decoupling pathological content from acoustic style to synthesize hard-to-diagnose samples. As a foundation for this work, we introduce **_Resp-229k_**, a benchmark corpus of 229k recordings paired with LLM-distilled clinical narratives. Extensive experiments demonstrate that Resp-Agent consistently outperforms prior approaches across diverse evaluation settings, improving diagnostic robustness under data scarcity and long-tailed class imbalance. Our code and data are available at [https://github.com/zpforlove/Resp-Agent](https://github.com/zpforlove/Resp-Agent).

Cite

Text

Zhang et al. "Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-respagent/)

BibTeX

@inproceedings{zhang2026iclr-respagent,
  title     = {{Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis}},
  author    = {Zhang, Pengfei and Xie, Tianxin and Yang, Minghao and Liu, Li},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-respagent/}
}