Coarse-to-Fine Human Mesh Recovery with Transformers

Agarwal, Vatsal; Levy, Mara; Ehrlich, Max; Tang, Youbao; Zhang, Ning; Shrivastava, Abhinav

doi:10.1007/978-3-031-91575-8_18

Coarse-to-Fine Human Mesh Recovery with Transformers

Vatsal Agarwal, Mara Levy, Max Ehrlich, Youbao Tang, Ning Zhang, Abhinav Shrivastava

ECCVW 2024 pp. 290-306

doi:10.1007/978-3-031-91575-8_18 /eccvw/2024/agarwal2024eccvw-coarsetofine/

Abstract

The introduction of Transformer networks in computer vision has resulted in rapid progress of deep models in a variety of vision tasks. Recently, there has been great success in utilizing such networks for the human mesh recovery task. While these works demonstrate remarkable performance, they suffer from high computational cost and slow speed due to the quadratic nature of the self-attention mechanism. In this work, we propose a coarse-to-fine modeling approach to improve the pipeline efficiency. We build upon previous approaches and adopt an encoder-decoder architecture to mine relationships between image, joint and vertex features. While previous works apply attention on the full set of vertex features, our key insight is that earlier model layers do not require such dense vertex representations and instead can rely on a sparser set of features. We evaluate our approach on the Human3.6M and 3DPW datasets and find that with our coarse-to-fine approach, we are able to achieve improved or competitive performance with a 3.7x reduction in FLOPs and a 1.7x reduction in activation count compared to state-of-the-art approaches.

PDF ECCVW Semantic Scholar

Cite

Text

Agarwal et al. "Coarse-to-Fine Human Mesh Recovery with Transformers." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91575-8_18

Markdown

[Agarwal et al. "Coarse-to-Fine Human Mesh Recovery with Transformers." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/agarwal2024eccvw-coarsetofine/) doi:10.1007/978-3-031-91575-8_18

BibTeX

@inproceedings{agarwal2024eccvw-coarsetofine,
  title     = {{Coarse-to-Fine Human Mesh Recovery with Transformers}},
  author    = {Agarwal, Vatsal and Levy, Mara and Ehrlich, Max and Tang, Youbao and Zhang, Ning and Shrivastava, Abhinav},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {290-306},
  doi       = {10.1007/978-3-031-91575-8_18},
  url       = {https://mlanthology.org/eccvw/2024/agarwal2024eccvw-coarsetofine/}
}