On Short Textual Value Column Representation Using Symbol Level Language Models
Abstract
String-type database columns containing short textual values are crucial for storing and managing a wide range of information in various applications. For example, they store categories, labels, enumerations, code, and abbreviations. Here, we discuss a string column representation using symbol level language models that grasps the symbol level ``distribution'' of the column textual values. These language models are known for their good prediction quality, memory-footprint and runtime efficiency, while being theoretically justified. We focus on a column matching application, and provide empirical indication for their usefulness.
Cite
Text
Begleiter and Roll. "On Short Textual Value Column Representation Using Symbol Level Language Models." NeurIPS 2024 Workshops: TRL, 2024.Markdown
[Begleiter and Roll. "On Short Textual Value Column Representation Using Symbol Level Language Models." NeurIPS 2024 Workshops: TRL, 2024.](https://mlanthology.org/neuripsw/2024/begleiter2024neuripsw-short/)BibTeX
@inproceedings{begleiter2024neuripsw-short,
title = {{On Short Textual Value Column Representation Using Symbol Level Language Models}},
author = {Begleiter, Ron and Roll, Nathan},
booktitle = {NeurIPS 2024 Workshops: TRL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/begleiter2024neuripsw-short/}
}