$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos
Abstract
Analyzing Fast, Frequent, and Fine-grained ($F^3$) events presents a significant challenge in video analytics and multi-modal LLMs. Current methods struggle to identify events that satisfy all the $F^3$ criteria with high accuracy due to challenges such as motion blur and subtle visual discrepancies. To advance research in video understanding, we introduce $F^3Set$, a benchmark that consists of video datasets for precise $F^3$ event detection. Datasets in $F^3Set$ are characterized by their extensive scale and comprehensive detail, usually encompassing over 1,000 event types with precise timestamps and supporting multi-level granularity. Currently, $F^3Set$ contains several sports datasets, and this framework may be extended to other applications as well. We evaluated popular temporal action understanding methods on $F^3Set$, revealing substantial challenges for existing techniques. Additionally, we propose a new method, $F^3ED$, for $F^3$ event detections, achieving superior performance. The dataset, model, and benchmark code are available at https://github.com/F3Set/F3Set.
Cite
Text
Liu et al. "$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos." International Conference on Learning Representations, 2025.Markdown
[Liu et al. "$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/liu2025iclr-3set/)BibTeX
@inproceedings{liu2025iclr-3set,
title = {{$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos}},
author = {Liu, Zhaoyu and Jiang, Kan and Ma, Murong and Hou, Zhe and Lin, Yun and Dong, Jin Song},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/liu2025iclr-3set/}
}