LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.

Abstract

Current research on generating 3D hand-object interaction motion primarily focuses on in-domain objects. Generalization to unseen objects is essential for practical applications, yet it remains both challenging and largely unexplored.In this paper, we propose LatentHOI, a novel approach designed to tackle the challenges of generalizing hand-object interaction to unseen objects.Our main insight lies in decoupling high-level temporal motion from fine-grained spatial hand-object interactions with a latent diffusion model coupled with a Grasping Variational Autoencoder (GraspVAE). This configuration not only enhances the conditional dependency between spatial grasp and temporal motion but also improves data utilization and reduces overfitting through regularization in the latent space. We conducted extensive experiments in an unseen-object setting on both single-hand grasping and bi-manual motion datasets, including GRAB, DexYCB, and OakInk.Quantitative and qualitative evaluations demonstrate that our method significantly enhances the realism and physical plausibility of generated motions for unseen objects, both in single and bimanual manipulations, compared to the state-of-the-art.

Cite

Text

Li et al. "LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01623

Markdown

[Li et al. "LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/li2025cvpr-latenthoi/) doi:10.1109/CVPR52734.2025.01623

BibTeX

@inproceedings{li2025cvpr-latenthoi,
  title     = {{LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.}},
  author    = {Li, Muchen and Christen, Sammy and Wan, Chengde and Cai, Yujun and Liao, Renjie and Sigal, Leonid and Ma, Shugao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {17416-17425},
  doi       = {10.1109/CVPR52734.2025.01623},
  url       = {https://mlanthology.org/cvpr/2025/li2025cvpr-latenthoi/}
}