Automatic Differentiation
Automatic differentiation (autograd) is a software technique that computes exact derivatives of functions defined as computer programs. It differs from:
- Numerical differentiation — approximates
f'(x) ≈ (f(x+h) − f(x)) / h; imprecise and slow for many parameters - Symbolic differentiation — algebraically manipulates expressions; produces exact but often enormous formulas
- Automatic differentiation — records the sequence of primitive operations at runtime and applies the chain-rule exactly and efficiently
How It Works (Reverse-Mode)
Reverse-mode autodiff (used in neural network training) works in two passes:
- Forward pass: Evaluate the function, constructing a computational-graph that records every primitive operation and its inputs.
- Backward pass (backpropagation): Starting from the output node with gradient 1, traverse the graph in reverse topological order, multiplying local gradients by incoming gradients to accumulate
∂output/∂inputfor every node.
For a function with n inputs and 1 output, one backward pass computes all n partial derivatives simultaneously — essential for training neural-networks with millions of weights.
The Value Object Pattern
micrograd implements autograd using a Value class. Each Value:
- Holds a scalar
dataand a scalargrad(initialised to 0) - Stores a reference to its
_backwardfunction (the local gradient rule) - Maintains pointers to its child
Valueobjects in the graph
When an operation is performed, a new Value is returned whose _backward closure knows how to push gradients to the operand Values.
Implementations
| Library | Scale | Notes |
|---|---|---|
| micrograd | Scalar | Pedagogical; 100 lines; andrej-karpathy |
| pytorch | Tensor | Production; dynamic graph; dominant in research |
| JAX | Tensor | Functional; XLA-compiled; used at Google DeepMind |
| TensorFlow | Tensor | Static graph option; production deployment |
Sources
- karpathy-2022-micrograd-backpropagation — builds an autograd engine from scratch