Multi-Layer Perceptron (MLP)

A multi-layer perceptron (MLP) is the foundational deep neural-network architecture: a directed sequence of fully connected (dense) layers, each followed by a nonlinear activation function. Inputs flow forward through every layer; backpropagation flows gradients backward through the same path during training.

Architecture

Input → [Layer 1] → [Layer 2] → … → [Layer N] → Output

Each layer contains k neurons operating in parallel. Each neuron computes:

output = tanh( w₁x₁ + w₂x₂ + … + wₙxₙ + b )

The output vector of layer i is the input vector of layer i+1.

Why Nonlinearity Matters

Without an activation function (e.g., tanh, ReLU), stacking layers is equivalent to a single linear transformation — the network cannot learn non-linear patterns. Nonlinear activations are what makes “deep” networks more expressive than “wide” ones.

MLP as a Class Hierarchy

andrej-karpathy’s micrograd implements the MLP in ~50 lines with three classes:

Class	Responsibility
`Neuron`	Single dot product + bias + tanh
`Layer`	List of `Neuron`s operating in parallel
`MLP`	List of `Layer`s chained sequentially

This hierarchy maps cleanly to how all neural network libraries (including pytorch) organise models.

Training

Forward pass through all layers to compute predictions and loss-function
backpropagation computes ∂loss/∂w for every weight w
gradient-descent updates w ← w − lr × w.grad
Reset gradients to zero; repeat

Relationship to Modern Architectures

MLPs are the conceptual ancestor of all deep networks. The transformer-architecture and gpt-architecture use MLPs as the feed-forward sublayer within each transformer block (the FFN / MLP component). MLPs are also used in mixture-of-experts routing networks.

Sources

karpathy-2022-micrograd-backpropagation — primary; implements Neuron / Layer / MLP on micrograd

My Knowledge Base

Explorer

Multi-Layer Perceptron (MLP)

Multi-Layer Perceptron (MLP)

Architecture

Why Nonlinearity Matters

MLP as a Class Hierarchy

Training

Relationship to Modern Architectures

Sources

Graph View

Table of Contents

Backlinks