Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Abstract
Predictive modeling of unannotated spatiotemporal data presents inherent challenges, primarily due to the highly entangled visual dynamics in real-world scenes. To tackle these complexities, we introduce a novel insight through Disentangling Deterministic and Probabilistic (DDP) modeling. We note a key observation in spatiotemporal data where low-level details typically remain stable, whereas high-level motion frequently exhibits dynamic variations. The core motivation involves constructing two distinct pathways in the latent space: a deterministic path and a probabilistic path. The probabilistic path begins by defining the motion flow, which explicitly describes complex many-to-many motion patterns between patches, and models its probabilistic distribution using a motion diffuser. The deterministic path incorporates a spectral-aware enhancer to retain and amplify visual details in the frequency domain. These designs ensure visual consistency while also capturing intricate long-term motion dynamics. Extensive experiments demonstrate the superiority of DDP across diverse scenario evaluations.
Cite
Text
Zhou et al. "Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/195Markdown
[Zhou et al. "Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/zhou2024ijcai-focus/) doi:10.24963/ijcai.2024/195BibTeX
@inproceedings{zhou2024ijcai-focus,
title = {{Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition}},
author = {Zhou, Bangbang and Qu, Yadong and Wang, Zixiao and Li, Zicheng and Zhang, Boqiang and Xie, Hongtao},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2024},
pages = {1762-1770},
doi = {10.24963/ijcai.2024/195},
url = {https://mlanthology.org/ijcai/2024/zhou2024ijcai-focus/}
}