Surely You’re Lying, Mr. Model: Improving and Analyzing CCS

Abstract

Contrast Consistent Search (Burns et al., 2022) is a method for eliciting latent knowledge without supervision. In this paper, we explore a few directions for improving CCS. We use conjunctive logic to make CCS fully unsupervised. We investigate which factors contribute to CCS’s poor performance on autoregressive models. Replicating (Belrose & Mallen, 2023), we improve CCS’s performance on autoregressive models and study the effect of multi-shot context. And we better characterize where CCS techniques add value by adding early exit baselines to the original CCS experiments, replicating (Halawi et al., 2023).

Cite

Text

Bashkansky et al. "Surely You’re Lying, Mr. Model: Improving and Analyzing CCS." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Bashkansky et al. "Surely You’re Lying, Mr. Model: Improving and Analyzing CCS." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/bashkansky2023icmlw-surely/)

BibTeX

@inproceedings{bashkansky2023icmlw-surely,
  title     = {{Surely You’re Lying, Mr. Model: Improving and Analyzing CCS}},
  author    = {Bashkansky, Naomi and Loughridge, Chloe R and Tang, Chuyue},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/bashkansky2023icmlw-surely/}
}