Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-Training

Abstract

Self-supervised pre-training is essential for 3D point cloud representation learning, as annotating their irregular, topology-free structures is costly and labor-intensive. Masked autoencoders (MAEs) offer a promising framework but rely on explicit positional embeddings, such as patch center coordinates, which leak geometric information and limit data-driven structural learning. In this work, we propose Point-MaDi, a novel Point cloud Masked autoencoding Diffusion framework for pre-training that integrates a dual-diffusion pretext task into an MAE architecture to address this issue. Specifically, we introduce a center diffusion mechanism in the encoder, noising and predicting the coordinates of both visible and masked patch centers without ground-truth positional embeddings. These predicted centers are processed using a transformer with self-attention and cross-attention to capture intra- and inter-patch relationships. In the decoder, we design a conditional patch diffusion process, guided by the encoder's latent features and predicted centers to reconstruct masked patches directly from noise. This dual-diffusion design drives comprehensive global semantic and local geometric representations during pre-training, eliminating external geometric priors. Extensive experiments on ScanObjectNN, ModelNet40, ShapeNetPart, S3DIS, and ScanNet demonstrate that Point-MaDi achieves superior performance across downstream tasks, surpassing Point-MAE by 5.51\% on OBJ-BG, 5.17\% on OBJ-ONLY, and 4.34\% on PB-T50-RS for 3D object classification on the ScanObjectNN dataset.

Cite

Text

Xiao et al. "Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-Training." Advances in Neural Information Processing Systems, 2025.

Markdown

[Xiao et al. "Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-Training." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/xiao2025neurips-pointmadi/)

BibTeX

@inproceedings{xiao2025neurips-pointmadi,
  title     = {{Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-Training}},
  author    = {Xiao, Xiaoyang and Yao, Runzhao and Tian, Zhiqiang and Du, Shaoyi},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/xiao2025neurips-pointmadi/}
}