Self-Supervised End-to-End ToF Imaging Based on RGB-D Cross-Modal Dependency

Abstract

Time-of-Flight (ToF) imaging systems are susceptible to various noise and degradation, which can severely affect image quality. Traditional sequential imaging pipelines often suffer from error accumulation due to separate multi-stage processing. Existing end-to-end methods typically rely on noisy-clean depth image pairs for supervised learning. However, acquiring ground-truth is challenging in real-world scenarios due to factors such as Multi-Path Interference (MPI), phase wrapping, and complex noise patterns. In this paper, we propose a self-supervised learning framework for end-to-end ToF imaging, which does not require any noisy-clean pairs yet generalizes well across various off-the-shelf cameras. Our framework leverages the cross-modal dependencies between RGB and depth data as implicit supervision to effectively suppress noise and maintain image fidelity. Additionally, the loss function integrates the statistical characteristics of raw measurement data, enhancing robustness against noise and artifacts. Extensive experiments on both synthetic and real-world data demonstrate that our approach achieves performance comparable to supervised methods, without requiring paired noisy-clean data for training. Furthermore, our method consistently delivers strong performance across all evaluated cameras, highlighting its generalization capabilities. The code is available at https://github.com/WeihangWANG/RGBD_imaging.

Cite

Text

Wang et al. "Self-Supervised End-to-End ToF Imaging Based on RGB-D Cross-Modal Dependency." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/221

Markdown

[Wang et al. "Self-Supervised End-to-End ToF Imaging Based on RGB-D Cross-Modal Dependency." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/wang2025ijcai-self/) doi:10.24963/IJCAI.2025/221

BibTeX

@inproceedings{wang2025ijcai-self,
  title     = {{Self-Supervised End-to-End ToF Imaging Based on RGB-D Cross-Modal Dependency}},
  author    = {Wang, Weihang and Wang, Jun and Wen, Fei},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1982-1990},
  doi       = {10.24963/IJCAI.2025/221},
  url       = {https://mlanthology.org/ijcai/2025/wang2025ijcai-self/}
}