Investigating the Security Threat Arising from "Yes-No" Implicit Bias in Large Language Models

Abstract

Large Language Models (LLMs) have gained significant attention for their exceptional performance across various domains. Despite their advancements, concerns persist regarding their implicit bias, which often leads to negative social impacts. Therefore, it is essential to identify the implicit bias in LLMs and investigate the potential threat posed by it. Our study focused on a specific type of implicit bias, termed the ''Yes-No'' implicit bias, which refers to LLMs' inherent tendency to favor ''Yes'' or ''No'' responses to a single instruction. By comparing the probability of LLMs generating a series of ''Yes'' versus ''No'' responses, we observed different inherent response tendencies exhibited by LLMs when faced with different instructions. To further investigate the impact of such bias, we developed an attack method called Implicit Bias In-Context Manipulation, attempting to manipulate LLMs' behavior. Specifically, we explored whether the ''Yes'' implicit bias could manipulate ''No'' responses into ''Yes'' in LLMs' responses to malicious instructions, leading to harmful outputs. Our findings revealed that the ''Yes'' implicit bias brings a significant security threat, comparable to that of carefully designed attack methods. Moreover, we offered a comprehensive analysis from multiple perspectives to deepen the understanding of this security threat, emphasizing the need for ongoing improvement in LLMs' security.

Cite

Text

Du et al. "Investigating the Security Threat Arising from "Yes-No" Implicit Bias in Large Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I22.34554

Markdown

[Du et al. "Investigating the Security Threat Arising from "Yes-No" Implicit Bias in Large Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/du2025aaai-investigating/) doi:10.1609/AAAI.V39I22.34554

BibTeX

@inproceedings{du2025aaai-investigating,
  title     = {{Investigating the Security Threat Arising from "Yes-No" Implicit Bias in Large Language Models}},
  author    = {Du, Yanrui and Zhao, Sendong and Ma, Ming and Chen, Yuhan and Qin, Bing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {23823-23831},
  doi       = {10.1609/AAAI.V39I22.34554},
  url       = {https://mlanthology.org/aaai/2025/du2025aaai-investigating/}
}