Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping

Abstract

One-shot transfer of dexterous grasps to novel scenes with object and context variations has been a challenging problem. While distilled feature fields from large vision models have enabled semantic correspondences across 3D scenes, their features are point-based and restricted to object surfaces, limiting their capability of modeling complex semantic feature distributions for hand-object interactions. In this work, we propose the *neural attention field* for representing semantic-aware dense feature fields in the 3D space by modeling inter-point relevance instead of individual point features. Core to it is a transformer decoder that computes the cross-attention between any 3D query point with all the scene points, and provides the query point feature with an attention-based aggregation. We further propose a self-supervised framework for training the transformer decoder from only a few 3D pointclouds without hand demonstrations. Post-training, the attention field can be applied to novel scenes for semantics-aware dexterous grasping from one-shot demonstration. Experiments show that our method provides better optimization landscapes by encouraging the end-effector to focus on task-relevant scene regions, resulting in significant improvements in success rates on real robots compared with the feature-field-based methods.

Cite

Text

Wang et al. "Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping." Proceedings of The 8th Conference on Robot Learning, 2024.

Markdown

[Wang et al. "Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping." Proceedings of The 8th Conference on Robot Learning, 2024.](https://mlanthology.org/corl/2024/wang2024corl-neural/)

BibTeX

@inproceedings{wang2024corl-neural,
  title     = {{Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping}},
  author    = {Wang, Qianxu and Deng, Congyue and Lum, Tyler Ga Wei and Chen, Yuanpei and Yang, Yaodong and Bohg, Jeannette and Zhu, Yixin and Guibas, Leonidas},
  booktitle = {Proceedings of The 8th Conference on Robot Learning},
  year      = {2024},
  pages     = {4495-4508},
  volume    = {270},
  url       = {https://mlanthology.org/corl/2024/wang2024corl-neural/}
}