Persona-Aware Generative Model for Code-Mixed Language

Abstract

Code-mixing and script-mixing are prevalent across online social networks and multilingual societies. However, a user's preference toward code-mixing depends on the socioeconomic status, demographics of the user, and the local context, which existing generative models tend to ignore while generating code-mixed texts. In this work, we make a pioneering attempt to develop a persona-aware generative model to generate texts resembling real-life code-mixed texts of individuals. We propose PARADOX, a persona-aware generative model for code-mixed text generation, which is a novel Transformer-based encoder-decoder model that encodes an utterance conditioned on a user's persona and generates code-mixed texts without monolingual reference data. We propose an alignment module that re-calibrates the generated sequence to resemble real-life code-mixed texts. PARADOX generates code-mixed texts that are semantically more meaningful and linguistically more valid. To evaluate the personification capabilities of PARADOX, we propose four new metrics -- CM BLEU, CM Rouge-1, CM Rouge-L and CM KS. On average, PARADOX achieves $1.6$% better CM BLEU, $57$% better perplexity and $32$% better semantic coherence than the non-persona-based counterparts.

Cite

Text

Sengupta et al. "Persona-Aware Generative Model for Code-Mixed Language." Transactions on Machine Learning Research, 2024.

Markdown

[Sengupta et al. "Persona-Aware Generative Model for Code-Mixed Language." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/sengupta2024tmlr-personaaware/)

BibTeX

@article{sengupta2024tmlr-personaaware,
  title     = {{Persona-Aware Generative Model for Code-Mixed Language}},
  author    = {Sengupta, Ayan and Akhtar, Md Shad and Chakraborty, Tanmoy},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/sengupta2024tmlr-personaaware/}
}