Chain Rule
The chain rule is the calculus rule for computing derivatives of composite functions. If y = f(g(x)), then:
dy/dx = (dy/dg) × (dg/dx)
In the context of neural-network training, every computation is a composition of primitive operations. backpropagation applies the chain rule recursively through the computational-graph:
∂L/∂input = (∂output/∂input) × (∂L/∂output)
[local gradient] [incoming gradient]
The chain rule is the entire mathematical content of backpropagation. automatic-differentiation engines automate its application across arbitrarily deep graphs.
Sources
- karpathy-2022-micrograd-backpropagation — chain rule explicitly implemented in each operation’s
_backwardclosure