Comparing Optimization Targets for Contrast-Consistent Search

Abstract

We investigate the optimization target of contrast-consistent search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function tentatively attains a higher test accuracy than CCS.

Cite

Text

Fry et al. "Comparing Optimization Targets for Contrast-Consistent Search." NeurIPS 2023 Workshops: SoLaR, 2023.

Markdown

[Fry et al. "Comparing Optimization Targets for Contrast-Consistent Search." NeurIPS 2023 Workshops: SoLaR, 2023.](https://mlanthology.org/neuripsw/2023/fry2023neuripsw-comparing/)

BibTeX

@inproceedings{fry2023neuripsw-comparing,
  title     = {{Comparing Optimization Targets for Contrast-Consistent Search}},
  author    = {Fry, Hugo and Fallows, Seamus and Wright, Jamie and Fan, Ian and Schoots, Nandi},
  booktitle = {NeurIPS 2023 Workshops: SoLaR},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/fry2023neuripsw-comparing/}
}