COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

Abstract

Recognizing irregular texts has been a challenging topic in text recognition. To encourage research on this topic, we provide a novel comic onomatopoeia dataset (COO), which consists of onomatopoeia texts in Japanese comics. COO has many arbitrary texts, such as extremely curved, partially shrunk texts, or arbitrarily placed texts. Furthermore, some texts are separated into several parts. Each part is a truncated text and is not meaningful by itself. These parts should be linked to represent the intended meaning. Thus, we propose a novel task that predicts the link between truncated texts. We conduct three tasks to detect the onomatopoeia region and capture its intended meaning: text detection, text recognition, and link prediction. Through extensive experiments, we analyze the characteristics of the COO. Our data and code are available at https://github.com/ku21fan/COO-Comic-Onomatopoeia.

Cite

Text

Baek et al. "COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19815-1_16

Markdown

[Baek et al. "COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/baek2022eccv-coo/) doi:10.1007/978-3-031-19815-1_16

BibTeX

@inproceedings{baek2022eccv-coo,
  title     = {{COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts}},
  author    = {Baek, Jeonghun and Matsui, Yusuke and Aizawa, Kiyoharu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19815-1_16},
  url       = {https://mlanthology.org/eccv/2022/baek2022eccv-coo/}
}