MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling

Abstract

High-quality textured mesh reconstruction from sparse-view images remains a fundamental challenge in computer graphics and computer vision. Traditional large reconstruction models operate in a single-scale manner, forcing the models to simultaneously capture global structure and local details, often resulting in compromised reconstructed shapes. In this work, we propose MS3D, a novel multi-scale 3D reconstruction framework. At its core, our method introduces a hierarchical structured latent representation for multi-scale modeling, coupled with a multi-scale feature extraction and integration mechanism. This enables progressive reconstruction, effectively decomposing the complex task of detailed geometry reconstruction into a sequence of easier steps. This coarse-to-fine approach effectively captures multi-frequency details, learns complex geometric patterns, and generalizes well across diverse objects while preserving fine-grained details. Extensive experiments demonstrate MS3D outperforms state-of-the-art methods and is broadly applicable to both image- and text-to-3D generation. The entire pipeline reconstructs high-quality textured meshes in under five seconds.

Cite

Text

Luo and Zhang. "MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling." International Conference on Computer Vision, 2025.

Markdown

[Luo and Zhang. "MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/luo2025iccv-ms3d/)

BibTeX

@inproceedings{luo2025iccv-ms3d,
  title     = {{MS3D: High-Quality 3D Generation via Multi-Scale Representation Modeling}},
  author    = {Luo, Guan and Zhang, Jianfeng},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {26336-26348},
  url       = {https://mlanthology.org/iccv/2025/luo2025iccv-ms3d/}
}