GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation

Abstract

Recent advancements in diffusion models and large-scale datasets have revolutionized image and video generation, with increasing focus on 3D content generation. While existing methods show promise, they face challenges in input formats, latent space structures, and output representations. This paper introduces a novel 3D generation framework that addresses these issues, enabling scalable and high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our approach utilizes a VAE with multi-view posed RGB-D-N renderings as input, features a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent flow-based model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single-view image inputs. Experimental results demonstrate superior performance on various datasets, advancing the state-of-the-art in 3D content generation.

Cite

Text

Lan et al. "GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation." International Conference on Learning Representations, 2025.

Markdown

[Lan et al. "GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/lan2025iclr-gaussiananything/)

BibTeX

@inproceedings{lan2025iclr-gaussiananything,
  title     = {{GaussianAnything: Interactive Point Cloud Flow Matching for 3D Generation}},
  author    = {Lan, Yushi and Zhou, Shangchen and Lyu, Zhaoyang and Hong, Fangzhou and Yang, Shuai and Dai, Bo and Pan, Xingang and Loy, Chen Change},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/lan2025iclr-gaussiananything/}
}