Controlling Large Language Models Through Concept Activation Vectors

Zhang, Hanyu; Wang, Xiting; Li, Chengao; Ao, Xiang; He, Qing

doi:10.1609/AAAI.V39I24.34778

Controlling Large Language Models Through Concept Activation Vectors

Hanyu Zhang, Xiting Wang, Chengao Li, Xiang Ao, Qing He

AAAI 2025 pp. 25851-25859

doi:10.1609/AAAI.V39I24.34778 /aaai/2025/zhang2025aaai-controlling/

Abstract

As large language models (LLMs) are widely deployed across various domains, the ability to control their generated outputs has become more critical. This control involves aligning LLMs outputs with human values and ethical principles or customizing LLMs on specific topics or styles for individual users. Existing controlled generation methods either require significant computational resources and extensive trial-and-error or provide coarse-grained control. In this paper, we propose Generation with Concept Activation Vector (GCAV), a lightweight model control framework that ensures accurate control without requiring resource-extensive fine-tuning. Specifically, GCAV first trains a concept activation vector for specified concepts to be controlled, such as toxicity. During inference, GCAV steers the concept vector in LLMs, for example, by removing the toxicity concept vector from the activation layers. Control experiments from different perspectives, including toxicity reduction, sentiment control, linguistic style, and topic control, demonstrate that our framework achieves state-of-the-art performance with granular control, allowing for fine-grained adjustments of both the steering layers and the steering magnitudes for individual samples.

PDF AAAI Semantic Scholar

Cite

Text

Zhang et al. "Controlling Large Language Models Through Concept Activation Vectors." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34778

Markdown

[Zhang et al. "Controlling Large Language Models Through Concept Activation Vectors." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-controlling/) doi:10.1609/AAAI.V39I24.34778

BibTeX

@inproceedings{zhang2025aaai-controlling,
  title     = {{Controlling Large Language Models Through Concept Activation Vectors}},
  author    = {Zhang, Hanyu and Wang, Xiting and Li, Chengao and Ao, Xiang and He, Qing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25851-25859},
  doi       = {10.1609/AAAI.V39I24.34778},
  url       = {https://mlanthology.org/aaai/2025/zhang2025aaai-controlling/}
}