CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders

Abstract

CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It does not require any additional training; rather, a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, which biases drawings towards simpler human-recognizable shapes. Results compare CLIPDraw with other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw.

Cite

Text

Frans et al. "CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders." Neural Information Processing Systems, 2022.

Markdown

[Frans et al. "CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/frans2022neurips-clipdraw/)

BibTeX

@inproceedings{frans2022neurips-clipdraw,
  title     = {{CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders}},
  author    = {Frans, Kevin and Soros, Lisa and Witkowski, Olaf},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/frans2022neurips-clipdraw/}
}