LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Abstract

Generating cognitive-aligned layered SVGs remains challenging due to existing methods' tendencies toward either oversimplified single-layer outputs or optimization-induced shape redundancies. We propose LayerTracer, a DiT based framework that bridges this gap by learning designers' layered SVG creation processes from a novel dataset of sequential design operations. Our approach operates in two phases: First, a text-conditioned DiT generates multi-phase rasterized construction blueprints that simulate human design workflows. Second, layer-wise vectorization with path deduplication produces clean, editable SVGs. For image vectorization, we introduce a conditional diffusion mechanism that encodes reference images into latent tokens, guiding hierarchical reconstruction while preserving structural integrity. Extensive experiments show that LayerTracer surpasses optimization-based and neural baselines in generation quality and editability.

Cite

Text

Song et al. "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer." International Conference on Computer Vision, 2025.

Markdown

[Song et al. "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/song2025iccv-layertracer/)

BibTeX

@inproceedings{song2025iccv-layertracer,
  title     = {{LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer}},
  author    = {Song, Yiren and Chen, Danze and Shou, Mike Zheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {19731-19741},
  url       = {https://mlanthology.org/iccv/2025/song2025iccv-layertracer/}
}