TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes

Abstract

Recent progress in the text-driven 3D stylization of a single object has been considerably promoted by CLIP-based methods. However the stylization of multi-object 3D scenes is still impeded in that the image-text pairs used for pre-training CLIP mostly consist of an object. Meanwhile the local details of multiple objects may be susceptible to omission due to the existing supervision manner primarily relying on coarse-grained contrast of image-text pairs. To overcome these challenges we present a novel framework dubbed TeMO to parse multi-object 3D scenes and edit their styles under the contrast supervision at multiple levels. We first propose a Decoupled Graph Attention (DGA) module to distinguishably reinforce the features of 3D surface points. Particularly a cross-modal graph is constructed to align the object points accurately and noun phrases decoupled from the 3D mesh and textual description. Then we develop a Cross-Grained Contrast (CGC) supervision system where a fine-grained loss between the words in the textual description and the randomly rendered images are constructed to complement the coarse-grained loss. Extensive experiments show that our method can synthesize high-quality stylized content and outperform the existing methods over a wide range of multi-object 3D meshes.

Cite

Text

Zhang et al. "TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01847

Markdown

[Zhang et al. "TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-temo/) doi:10.1109/CVPR52733.2024.01847

BibTeX

@inproceedings{zhang2024cvpr-temo,
  title     = {{TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes}},
  author    = {Zhang, Xuying and Yin, Bo-Wen and Chen, Yuming and Lin, Zheng and Li, Yunheng and Hou, Qibin and Cheng, Ming-Ming},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {19531-19540},
  doi       = {10.1109/CVPR52733.2024.01847},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-temo/}
}