Chain Rule

The chain rule is the calculus rule for computing derivatives of composite functions. If y = f(g(x)), then:

dy/dx = (dy/dg) × (dg/dx)

In the context of neural-network training, every computation is a composition of primitive operations. backpropagation applies the chain rule recursively through the computational-graph:

∂L/∂input = (∂output/∂input) × (∂L/∂output)
             [local gradient]   [incoming gradient]

The chain rule is the entire mathematical content of backpropagation. automatic-differentiation engines automate its application across arbitrarily deep graphs.


Sources