EPD: Long-Term Memory Extraction, Context-Aware Planning and Multi-Iteration Decision @ EgoPlan Challenge ICML 2024
Abstract
In this technical report, we present our solution for the EgoPlan Challenge in ICML 2024. To address the real-world egocentric task planning problem, we introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-aware Planning, and multi-iteration Decision, named EPD. Given the task goal, task progress, and current observation, the extraction model first extracts task-relevant memory information from the progress video, transforming the complex long video into summarized memory information. The planning model then combines the context of the memory information with fine-grained visual information from the current observation to predict the next action. Finally, through multi-iteration decision-making, the decision model comprehensively understands the task situation and current state to make the most realistic planning decision. On the EgoPlan-Test set, EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions. We have made all codes available at https://github.com/Kkskkkskr/EPD .
Cite
Text
Shi et al. "EPD: Long-Term Memory Extraction, Context-Aware Planning and Multi-Iteration Decision @ EgoPlan Challenge ICML 2024." ICML 2024 Workshops: MFM-EAI, 2024.Markdown
[Shi et al. "EPD: Long-Term Memory Extraction, Context-Aware Planning and Multi-Iteration Decision @ EgoPlan Challenge ICML 2024." ICML 2024 Workshops: MFM-EAI, 2024.](https://mlanthology.org/icmlw/2024/shi2024icmlw-epd/)BibTeX
@inproceedings{shi2024icmlw-epd,
title = {{EPD: Long-Term Memory Extraction, Context-Aware Planning and Multi-Iteration Decision @ EgoPlan Challenge ICML 2024}},
author = {Shi, Letian and Lv, Qi and Deng, Xiang and Nie, Liqiang},
booktitle = {ICML 2024 Workshops: MFM-EAI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/shi2024icmlw-epd/}
}