Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition

Abstract

Predictive modeling of unannotated spatiotemporal data presents inherent challenges, primarily due to the highly entangled visual dynamics in real-world scenes. To tackle these complexities, we introduce a novel insight through Disentangling Deterministic and Probabilistic (DDP) modeling. We note a key observation in spatiotemporal data where low-level details typically remain stable, whereas high-level motion frequently exhibits dynamic variations. The core motivation involves constructing two distinct pathways in the latent space: a deterministic path and a probabilistic path. The probabilistic path begins by defining the motion flow, which explicitly describes complex many-to-many motion patterns between patches, and models its probabilistic distribution using a motion diffuser. The deterministic path incorporates a spectral-aware enhancer to retain and amplify visual details in the frequency domain. These designs ensure visual consistency while also capturing intricate long-term motion dynamics. Extensive experiments demonstrate the superiority of DDP across diverse scenario evaluations.

Cite

Text

Zhou et al. "Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/195

Markdown

[Zhou et al. "Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/zhou2024ijcai-focus/) doi:10.24963/ijcai.2024/195

BibTeX

@inproceedings{zhou2024ijcai-focus,
  title     = {{Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition}},
  author    = {Zhou, Bangbang and Qu, Yadong and Wang, Zixiao and Li, Zicheng and Zhang, Boqiang and Xie, Hongtao},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {1762-1770},
  doi       = {10.24963/ijcai.2024/195},
  url       = {https://mlanthology.org/ijcai/2024/zhou2024ijcai-focus/}
}