Text2Data: Low-Resource Data Generation with Textual Control

Abstract

The machine learning community has been investing considerable effort in generating data that is semantically coherent with textual instructions. Nevertheless, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model and then undergoes controllable finetuning via a novel constraint optimization-based learning objective to ensure controllability. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.

Cite

Text

Wang et al. "Text2Data: Low-Resource Data Generation with Textual Control." ICLR 2024 Workshops: PML4LRS, 2024.

Markdown

[Wang et al. "Text2Data: Low-Resource Data Generation with Textual Control." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/wang2024iclrw-text2data/)

BibTeX

@inproceedings{wang2024iclrw-text2data,
  title     = {{Text2Data: Low-Resource Data Generation with Textual Control}},
  author    = {Wang, Shiyu and Feng, Yihao and Lan, Tian and Yu, Ning and Bai, Yu and Xu, Ran and Wang, Huan and Xiong, Caiming and Savarese, Silvio},
  booktitle = {ICLR 2024 Workshops: PML4LRS},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/wang2024iclrw-text2data/}
}