EchoDiffusion: Waveform Conditioned Diffusion Models for Echo-Based Depth Estimation

Abstract

To extract spatial information, depth estimation using conventional echo-based methods typically employs models with encoder-decoder architectures, such as UNet. However, these methods may face challenges in extracting fine details from echo waveforms and handling multi-scale feature extraction with high precision. To address these challenges, we introduce EchoDiffusion, a framework that incorporates diffusion models conditioned on waveform embeddings for echo-based depth estimation. This framework employs the Multi-Scale Adaptive Latent Feature Network (MALF-Net) to extract multi-scale spatial features and perform adaptive fusion, encoding the echo spectrograms into the latent space. Additionally, we propose the Echo Waveform Detail Embedder (EWDE), which leverages a pre-trained Wav2Vec model to extract detailed spatial information from echo waveforms, using these details as conditional inputs to guide the reverse diffusion process in the latent space. By embedding the echo waveforms into the reverse diffusion process, we can more accurately guide the generation of depth maps. Our extensive evaluations on the Replica and Matterport3D datasets demonstrate that EchoDiffusion establishes new benchmarks for state-of-the-art performance in echo-based depth estimation.

Cite

Text

Zhang et al. "EchoDiffusion: Waveform Conditioned Diffusion Models for Echo-Based Depth Estimation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34416

Markdown

[Zhang et al. "EchoDiffusion: Waveform Conditioned Diffusion Models for Echo-Based Depth Estimation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-echodiffusion/) doi:10.1609/AAAI.V39I21.34416

BibTeX

@inproceedings{zhang2025aaai-echodiffusion,
  title     = {{EchoDiffusion: Waveform Conditioned Diffusion Models for Echo-Based Depth Estimation}},
  author    = {Zhang, Wenjie and Yin, Jun and Ma, Long and Yu, Peng and Jiang, Xiaoheng and Tian, Zhen and Xu, Mingliang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22578-22586},
  doi       = {10.1609/AAAI.V39I21.34416},
  url       = {https://mlanthology.org/aaai/2025/zhang2025aaai-echodiffusion/}
}