Neural Network

A neural network is a mathematical expression — a computational-graph — that maps input data to outputs through a sequence of operations parameterised by learnable weights. Training a neural network means finding weights that minimise a loss-function measuring prediction error, which is done by backpropagation and gradient-descent.

Key Insight

Neural networks are not magic; they are structured mathematical expressions. The same backpropagation algorithm that computes gradients through any expression graph computes gradients through a neural network, because a network is just a particular kind of graph. andrej-karpathy’s micrograd demonstrates this: a 100-line autograd engine suffices to train a complete network.

Components

Neuron

The basic unit. A neuron computes:

output = activation( Σ(wᵢ × xᵢ) + b )

where xᵢ are inputs, wᵢ are weights, b is a bias, and activation is a nonlinear function (e.g., tanh, ReLU, sigmoid). Without the nonlinearity, stacked layers collapse to a single linear transformation.

Layer

A collection of neurons operating in parallel on the same inputs, producing a vector of outputs. Each neuron in a layer has its own independent weights.

Multi-Layer Perceptron (MLP)

The simplest deep network: a multi-layer-perceptron is a sequence of layers where the output of one layer feeds the input of the next. See multi-layer-perceptron for details.

Training Pipeline

Forward pass — evaluate the expression from input to loss
Backward pass — run backpropagation to compute gradients
Weight update — adjust weights with gradient-descent
Repeat until loss converges

Scalars vs. Tensors

Pedagogically, networks can be implemented with scalar operations (as in micrograd). In practice, scalars are grouped into tensors (multi-dimensional arrays), enabling parallel hardware execution. pytorch and JAX do this transparently; the mathematics is unchanged.

Relationship to LLMs

large-language-models and the transformer-architecture are specialised neural networks at enormous scale. The same forward/backward training loop applies; the architecture adds attention mechanisms and positional encodings on top of the same fundamental computation.

Sources

karpathy-2022-micrograd-backpropagation — builds a neural network from scratch using micrograd
raschka-2024-build-llm-from-scratch — builds a GPT (large transformer neural network) in pytorch

My Knowledge Base

Explorer

Neural Network

Neural Network

Key Insight

Components

Neuron

Layer

Multi-Layer Perceptron (MLP)

Training Pipeline

Scalars vs. Tensors

Relationship to LLMs

Sources

Graph View

Table of Contents

Backlinks