Notable Works
Influential works not published at venues the anthology already indexes.
50 works spanning 1763–2021.
The Foundations
The foundation of everything probabilistic in ML.
The Three Laws of Robotics. "After 'Runaround' appeared in the March 1942 issue of Astounding, I never stopped thinking about how minds might work." — Marvin Minsky
The first mathematical model of a neuron. Everything starts here.
Entropy, mutual information, channel capacity. Every loss function in deep learning is downstream of this.
The imitation game and the question of whether machines can think.
The call-to-arms for a generation of AI researchers. Search, pattern recognition, learning, planning, induction.
Neural Computation
"Neurons that fire together wire together." The origin of associative learning.
The idea of neural networks as computation.
The neuroscience that inspired convolutional networks.
The book that killed connectionism for a decade. The XOR problem.
The origin of convolutional neural networks.
Brought physicists into neural networks, led to Boltzmann machines.
Backpropagation. You know what this is.
The LeNet paper.
Deep belief nets. The paper that ended the second AI winter.
Statistical Learning Theory
SGD. You also know what this is.
Algorithmic probability, compression and intelligence.
VC dimension. The theoretical foundation of statistical learning.
The EM algorithm. Cited constantly across all of machine learning.
Minimum description length. The formal link between compression and learning.
PAC learning. The origin of computational learning theory.
The universal approximation theorem; why neural networks work at all.
The original treatment of Bayesian neural networks.
Reinforcement Learning
The Bellman equation. The foundation of dynamic programming and RL.
Coined the term "machine learning."
Intrinsic motivation, curiosity-driven learning. Predates the exploration-exploitation literature in deep RL.
Information Coding
Maximum entropy. The bridge between Shannon's information theory and statistical inference.
Efficient coding hypothesis, redundancy reduction. The intellectual ancestor of autoencoder, sparse coding, and noncontrastive SSL.
The textbook that taught ML how to use information theory. KL divergence, rate-distortion, channel capacity.
Guess what else Schmidhuber said he anticipated in 1997? Hint: it starts with A.
The information bottleneck. Learning as compression formalized.
Word2Vec. King minus man plus woman equals queen.
Critiques
"In no part of the field have the discoveries made so far produced the major impact that was then promised."
The Chinese Room. The famous argument against strong AI.
The embodied cognition manifesto. Looks increasingly prescient.
The book that made the world take AI risk seriously.
Stochastic parrots. The paper that launched the AI ethics debate and got an author fired.
Proposals and Founding Documents
A two-page funding application that named the field of artificial intelligence.
Parallel feature detection. An early conceptual convolutional network.
The founding document of evolutionary computation.
Graphical models. Probabilistic reasoning made tractable.
Blog Posts and Informal Publications
Taught a lot of people how RNNs work.
Neural network interpretability as visual explanation. The seed of Distill.
The empirical receipt of the scaling hypothesis.