Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

Anil, Cem; Pokle, Ashwini; Liang, Kaiqu; Treutlein, Johannes; Wu, Yuhuai; Bai, Shaojie; Kolter, J. Zico; Grosse, Roger B

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J. Zico Kolter, Roger B Grosse

NeurIPS 2022

/neurips/2022/anil2022neurips-path/

Abstract

Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. In this work, we reproduce the performance of the prior art using a broader class of architectures called equilibrium models, and find that stronger generalization performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system—its ability to converge to the same attractor (or limit cycle) regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder (and thus more compute-hungry) problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Thus, considering equilibrium models and path independence jointly leads to a valuable new viewpoint under which we can study the generalization performance of these networks on hard problem instances.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Anil et al. "Path Independent Equilibrium Models Can Better Exploit Test-Time Computation." Neural Information Processing Systems, 2022.

Markdown

[Anil et al. "Path Independent Equilibrium Models Can Better Exploit Test-Time Computation." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/anil2022neurips-path/)

BibTeX

@inproceedings{anil2022neurips-path,
  title     = {{Path Independent Equilibrium Models Can Better Exploit Test-Time Computation}},
  author    = {Anil, Cem and Pokle, Ashwini and Liang, Kaiqu and Treutlein, Johannes and Wu, Yuhuai and Bai, Shaojie and Kolter, J. Zico and Grosse, Roger B},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/anil2022neurips-path/}
}