Automatic Differentiation

Automatic differentiation (autograd) is a software technique that computes exact derivatives of functions defined as computer programs. It differs from:

  • Numerical differentiation — approximates f'(x) ≈ (f(x+h) − f(x)) / h; imprecise and slow for many parameters
  • Symbolic differentiation — algebraically manipulates expressions; produces exact but often enormous formulas
  • Automatic differentiation — records the sequence of primitive operations at runtime and applies the chain-rule exactly and efficiently

How It Works (Reverse-Mode)

Reverse-mode autodiff (used in neural network training) works in two passes:

  1. Forward pass: Evaluate the function, constructing a computational-graph that records every primitive operation and its inputs.
  2. Backward pass (backpropagation): Starting from the output node with gradient 1, traverse the graph in reverse topological order, multiplying local gradients by incoming gradients to accumulate ∂output/∂input for every node.

For a function with n inputs and 1 output, one backward pass computes all n partial derivatives simultaneously — essential for training neural-networks with millions of weights.


The Value Object Pattern

micrograd implements autograd using a Value class. Each Value:

  • Holds a scalar data and a scalar grad (initialised to 0)
  • Stores a reference to its _backward function (the local gradient rule)
  • Maintains pointers to its child Value objects in the graph

When an operation is performed, a new Value is returned whose _backward closure knows how to push gradients to the operand Values.


Implementations

LibraryScaleNotes
microgradScalarPedagogical; 100 lines; andrej-karpathy
pytorchTensorProduction; dynamic graph; dominant in research
JAXTensorFunctional; XLA-compiled; used at Google DeepMind
TensorFlowTensorStatic graph option; production deployment

Sources