MAT2I: Enhancing Perceptual Authenticity in Text-to-Image Synthesis Using Multi-Attribute Generative Adversarial Networks
Abstract
Generating visuals from text involves deriving visual representations from textual descriptions and transforming them into corresponding visuals. This technique finds vast application in various fields, such as graphic design and image editing. Generative adversarial networks (GANs) are the widely used and better performers for this task. A primary hurdle in this process is producing perceptually authentic visuals. This study introduces a MultiAttribute Text to Image Synthesis Generative Adversarial Network (MAT2I) to address these challenges. The enhancements encompass attribute-control-net, feature alignment, and perceptual loss. The attribute-control-net is used for the fast and attribute-specific generation to maintain authenticity in perceptuality with adaptability. Feature alignment and perceptual loss motivate the generator to create visuals that closely resemble real visuals based on the accompanying text and to reduce randomness. The effectiveness of the proposed model is gauged on the CUB and COCO datasets. Empirical findings illustrate that this approach generates visuals with greater content diversity, enhanced realism, and improved semantic alignment with provided text descriptions. Furthermore, the proposed method surpasses comparative techniques in terms of inception score, further establishing its competitive performance.
Cite
Text
Singh et al. "MAT2I: Enhancing Perceptual Authenticity in Text-to-Image Synthesis Using Multi-Attribute Generative Adversarial Networks." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.18237Markdown
[Singh et al. "MAT2I: Enhancing Perceptual Authenticity in Text-to-Image Synthesis Using Multi-Attribute Generative Adversarial Networks." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/singh2025jair-mat2i/) doi:10.1613/JAIR.1.18237BibTeX
@article{singh2025jair-mat2i,
title = {{MAT2I: Enhancing Perceptual Authenticity in Text-to-Image Synthesis Using Multi-Attribute Generative Adversarial Networks}},
author = {Singh, Varsha and Singh, Vijai and Tiwary, Uma Shanker},
journal = {Journal of Artificial Intelligence Research},
year = {2025},
pages = {2453-2469},
doi = {10.1613/JAIR.1.18237},
volume = {82},
url = {https://mlanthology.org/jair/2025/singh2025jair-mat2i/}
}