CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders
Abstract
CLIPDraw is an algorithm that synthesizes novel drawings from natural language input. It does not require any additional training; rather, a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, which biases drawings towards simpler human-recognizable shapes. Results compare CLIPDraw with other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw.
Cite
Text
Frans et al. "CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders." Neural Information Processing Systems, 2022.Markdown
[Frans et al. "CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/frans2022neurips-clipdraw/)BibTeX
@inproceedings{frans2022neurips-clipdraw,
title = {{CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders}},
author = {Frans, Kevin and Soros, Lisa and Witkowski, Olaf},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/frans2022neurips-clipdraw/}
}