Cheng, Xiang
22 publications
NeurIPS
2025
From SoftMax to Score: Transformers Can Effectively Implement In-Context Denoising Steps
ICML
2024
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions in Context
NeurIPS
2023
Transformers Learn to Implement Preconditioned Gradient Descent for In-Context Learning