Beyond What if: Advancing Counterfactual Text Generation with Structural Causal Modeling

Abstract

Adaptive gradient methods, primarily based on Adam, are prevalent in training neural networks, adjusting step sizes via exponentially decaying averages of gradients and squared gradients. Adam assigns small weights to distant gradients, termed long-tail gradients in this paper. However, these gradients persistently influence update behavior, potentially degrading generalization performance. To address this issue, we incorporate a restart mechanism into moment estimations, proposing AdaR (ADAptive gradient methods via Restarting moment estimations). Specifically, AdaR divides a training epoch into fixed-iteration intervals, alternating between two sets of moment estimations for parameter updates and discarding prior moment estimations at the beginning of each interval. Within each interval, one set updates parameters and will be discarded in the subsequent interval, while the other is reset at the midpoint to estimate moments for updates in the subsequent interval. The restart mechanism cyclically discards distant gradients, initiates fresh moment estimations for parameter updates, and stabilizes training. By prioritizing recent gradients, the method increases estimation accuracy and enhances step size adjustment. Empirically, AdaR outperforms state-of-the-art optimization algorithms on image classification and language modeling tasks, demonstrating superior generalization and faster convergence.

Cite

Text

Wang et al. "Beyond What if: Advancing Counterfactual Text Generation with Structural Causal Modeling." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/721

Markdown

[Wang et al. "Beyond What if: Advancing Counterfactual Text Generation with Structural Causal Modeling." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/wang2024ijcai-beyond-a/) doi:10.24963/ijcai.2024/721

BibTeX

@inproceedings{wang2024ijcai-beyond-a,
  title     = {{Beyond What if: Advancing Counterfactual Text Generation with Structural Causal Modeling}},
  author    = {Wang, Ziao and Zhang, Xiaofeng and Du, Hongwei},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {6522-6530},
  doi       = {10.24963/ijcai.2024/721},
  url       = {https://mlanthology.org/ijcai/2024/wang2024ijcai-beyond-a/}
}