Textmatcher: Cross-Attentional Neural Network to Compare Image and Text

Arrigoni, Valentina; Repele, Luisa; Saccavino, Dario Marino

doi:10.1007/S10994-023-06418-6

Textmatcher: Cross-Attentional Neural Network to Compare Image and Text

Valentina Arrigoni, Luisa Repele, Dario Marino Saccavino

MLJ 2024 pp. 2045-2066

doi:10.1007/S10994-023-06418-6 /mlj/2024/arrigoni2024mlj-textmatcher/

Abstract

We study a multimodal-learning problem where, given an image containing a single-line (printed or handwritten) text and a candidate text transcription, the goal is to assess whether the text represented in the image corresponds to the candidate text. This problem, which we dub text matching, is primarily motivated by a real industrial application scenario of automated cheque processing, whose goal is to automatically assess whether the information in a bank cheque (e.g., issue date) match the data that have been entered by the customer while depositing the cheque to an automated teller machine (ATM). The problem finds more general application in several other scenarios too, e.g., personal-identity-document processing in user-registration procedures. We devise a machine-learning model specifically designed for the text-matching problem. The proposed model, termed TextMatcher, compares the two inputs by applying a novel cross-attention mechanism over the embedding representations of image and text, and it is trained in an end-to-end fashion on the desired distribution of errors to be detected. We demonstrate the effectiveness of TextMatcher on the automated-cheque-processing use case, where TextMatcher is shown to generalize well to future unseen dates, unlike existing models designed for related problems. We further assess the performance of TextMatcher on different distributions of errors on the public IAM dataset. Results attest that, compared to a naïve model, a variant with fully-connected layers instead of the cross-attention module and existing models for related problems, TextMatcher achieves higher performance on a variety of configurations.

PDF MLJ Semantic Scholar

Cite

Text

Arrigoni et al. "Textmatcher: Cross-Attentional Neural Network to Compare Image and Text." Machine Learning, 2024. doi:10.1007/S10994-023-06418-6

Markdown

[Arrigoni et al. "Textmatcher: Cross-Attentional Neural Network to Compare Image and Text." Machine Learning, 2024.](https://mlanthology.org/mlj/2024/arrigoni2024mlj-textmatcher/) doi:10.1007/S10994-023-06418-6

BibTeX

@article{arrigoni2024mlj-textmatcher,
  title     = {{Textmatcher: Cross-Attentional Neural Network to Compare Image and Text}},
  author    = {Arrigoni, Valentina and Repele, Luisa and Saccavino, Dario Marino},
  journal   = {Machine Learning},
  year      = {2024},
  pages     = {2045-2066},
  doi       = {10.1007/S10994-023-06418-6},
  volume    = {113},
  url       = {https://mlanthology.org/mlj/2024/arrigoni2024mlj-textmatcher/}
}