EVE: Efficient Zero-Shot Text-Based Video Editing with Depth mAP Guidance and Temporal Consistency Constraints

Abstract

The dynamic movement of the human body presents a fundamental challenge for human pose estimation and body segmentation. State-of-the-art approaches primarily rely on combining keypoint heatmaps with segmentation masks, but often struggle in scenarios involving overlapping joints during pose estimation or rapidly changing poses for instance-level segmentation. To address these limitations, we leverage Keypoints as Dynamic Centroid (KDC), a new centroid-based representation for unified human pose estimation and instance-level segmentation. KDC adopts a bottom-up paradigm to generate keypoint heatmaps for easily distinguishable and complex keypoints, and improves keypoint detection and confidence scores by introducing KeyCentroids using a keypoint disk. It leverages high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids, allowing for the swift clustering of pixels to specific human instances during rapid changes in human body movements in a live environment. Our experimental evaluations focus on crowded and occluded cases using the CrowdPose, OCHuman, and COCO benchmarks, demonstrating KDC’s effectiveness and generalizability in challenging scenarios in terms of both accuracy and runtime performance. Our implementation is available at https://sites.google.com/view/niazahmad/projects/kdc.

Cite

Text

Chen et al. "EVE: Efficient Zero-Shot Text-Based Video Editing with Depth mAP Guidance and Temporal Consistency Constraints." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/75

Markdown

[Chen et al. "EVE: Efficient Zero-Shot Text-Based Video Editing with Depth mAP Guidance and Temporal Consistency Constraints." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/chen2024ijcai-eve/) doi:10.24963/ijcai.2024/75

BibTeX

@inproceedings{chen2024ijcai-eve,
  title     = {{EVE: Efficient Zero-Shot Text-Based Video Editing with Depth mAP Guidance and Temporal Consistency Constraints}},
  author    = {Chen, Yutao and Dong, Xingning and Gan, Tian and Zhou, Chunluan and Yang, Ming and Guo, Qingpei},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {677-685},
  doi       = {10.24963/ijcai.2024/75},
  url       = {https://mlanthology.org/ijcai/2024/chen2024ijcai-eve/}
}