Two-Step Validation in Character-Based Ingredient Normalization

Abstract

Although ingredients are important items of information in recipes, it is difficult to process them, especially for computers, because they are user-generated informal text. To normalize ingredients, we can use a character-based encoder-decoder model that takes the character sequence of an ingredient as an input and outputs its canonical form. However, the model still has two problems: The first is that the model often generates unnatural sequences as outputs. The second problem is that the generated sequences are sometimes unrelated to the original ingredient. Therefore, we propose a two-step validation to generate better normalizations. In the first validation step, we use a trie to limit the normalization candidates to existing sequences. In the second validation step, we rerank the normalization candidates based on their similarity to the original ingredient. We conducted experiments using a corpus that includes approximately 10 thousand pairs of ingredients and their canonical forms and showed that our proposed validation improved the performance of encoder-decoder models.

Cite

Text

Harashima and Yamada. "Two-Step Validation in Character-Based Ingredient Normalization." International Joint Conference on Artificial Intelligence, 2018. doi:10.1145/3230519.3230589

Markdown

[Harashima and Yamada. "Two-Step Validation in Character-Based Ingredient Normalization." International Joint Conference on Artificial Intelligence, 2018.](https://mlanthology.org/ijcai/2018/harashima2018ijcai-two/) doi:10.1145/3230519.3230589

BibTeX

@inproceedings{harashima2018ijcai-two,
  title     = {{Two-Step Validation in Character-Based Ingredient Normalization}},
  author    = {Harashima, Jun and Yamada, Yoshiaki},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {29-32},
  doi       = {10.1145/3230519.3230589},
  url       = {https://mlanthology.org/ijcai/2018/harashima2018ijcai-two/}
}