An Image-like Diffusion Method for Human-Object Interaction Detection
Abstract
Human-object interaction (HOI) detection often faces high levels of ambiguity and indeterminacy, as the same interaction can appear vastly different across different human-object pairs. Additionally, the indeterminacy can be further exacerbated by issues such as occlusions and cluttered backgrounds. To handle such a challenging task, in this work, we begin with a key observation: the output of HOI detection for each human-object pair can be recast as an image. Thus, inspired by the strong image generation capabilities of image diffusion models, we propose a new framework, HOI-IDiff. In HOI-IDiff, we tackle HOI detection from a novel perspective, using an Image-like Diffusion process to generate HOI detection outputs as images. Furthermore, recognizing that our recast images differ in certain properties from natural images, we enhance our framework with a customized HOI diffusion process and a slice patchification model architecture, which are specifically tailored to generate our recast "HOI images". Extensive experiments demonstrate the efficacy of our framework.
Cite
Text
Hui et al. "An Image-like Diffusion Method for Human-Object Interaction Detection." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01307Markdown
[Hui et al. "An Image-like Diffusion Method for Human-Object Interaction Detection." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/hui2025cvpr-imagelike/) doi:10.1109/CVPR52734.2025.01307BibTeX
@inproceedings{hui2025cvpr-imagelike,
title = {{An Image-like Diffusion Method for Human-Object Interaction Detection}},
author = {Hui, Xiaofei and Qu, Haoxuan and Rahmani, Hossein and Liu, Jun},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {14002-14012},
doi = {10.1109/CVPR52734.2025.01307},
url = {https://mlanthology.org/cvpr/2025/hui2025cvpr-imagelike/}
}