Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation

Abstract

The identification and filtration of malicious texts in social media environments represent a significant technical challenge aimed at protecting users from online violence and disinformation. This complexity stems from the diversity and innovativeness of social media texts, which include unique expressions and special sentence structures. Particularly, malicious texts in interrogative forms pose alignment challenges with traditional corpora due to existing methods' failure to exploit the text's deep global semantic representations. This issue is compounded by the scant research on Chinese texts, leading to inefficiencies in recognition accuracy. To mitigate these challenges, we introduce an innovative framework based on a Global Contrastive Semantic Network (GCSN), designed to enhance malicious text recognition efficiency and accuracy by deeply learning global semantic knowledge. It comprises an encoder for global semantic information modelling and a graph-matching network for semantic similarity evaluation between question pairs, enabling the accurate identification and filtering of malicious texts with complex structures. Furthermore, we introduce a semantic consistency-based data augmentation method (COMBINE), using real-world data to generate balanced positive and negative samples, enriching the dataset and enhancing the model's ability to distinguish semantic consistency through contrastive learning. Experimental validation on two Chinese datasets demonstrates our model's exceptional performance, affirming its applicationa value in social media malicious text recognition. Our code is available at https://github.com/Wxy13131313131/GCSN-COMBINE

Cite

Text

Wang et al. "Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.16369

Markdown

[Wang et al. "Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/wang2025jair-semantic/) doi:10.1613/JAIR.1.16369

BibTeX

@article{wang2025jair-semantic,
  title     = {{Semantic Alignment of Malicious Question Based on Contrastive Semantic Networks and Data Augmentation}},
  author    = {Wang, Xinyan and Liu, Jinshuo and Deng, Juan and Wang, Meng and Deng, Qian and Yan, Youcheng and Wang, Lina and Ma, Yunsong and Pan, Jeff Z.},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2025},
  pages     = {1243-1266},
  doi       = {10.1613/JAIR.1.16369},
  volume    = {82},
  url       = {https://mlanthology.org/jair/2025/wang2025jair-semantic/}
}