DiffVL: Scaling up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics

Abstract

Combining gradient-based trajectory optimization with differentiable physics simulation is an efficient technique for solving soft-body manipulation problems.Using a well-crafted optimization objective, the solver can quickly converge onto a valid trajectory.However, writing the appropriate objective functions requires expert knowledge, making it difficult to collect a large set of naturalistic problems from non-expert users.We introduce DiffVL, a method that enables non-expert users to communicate soft-body manipulation tasks -- a combination of vision and natural language, given in multiple stages -- that can be readily leveraged by a differential physics solver. We have developed GUI tools that enable non-expert users to specify 100 tasks inspired by real-life soft-body manipulations from online videos, which we'll make public.We leverage large language models to translate task descriptions into machine-interpretable optimization objectives. The optimization objectives can help differentiable physics solvers to solve these long-horizon multistage tasks that are challenging for previous baselines.

Cite

Text

Huang et al. "DiffVL: Scaling up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics." Neural Information Processing Systems, 2023.

Markdown

[Huang et al. "DiffVL: Scaling up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/huang2023neurips-diffvl/)

BibTeX

@inproceedings{huang2023neurips-diffvl,
  title     = {{DiffVL: Scaling up Soft Body Manipulation Using Vision-Language Driven Differentiable Physics}},
  author    = {Huang, Zhiao and Chen, Feng and Pu, Yewen and Lin, Chunru and Su, Hao and Gan, Chuang},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/huang2023neurips-diffvl/}
}