Loss Function

A loss function (also called cost function or objective function) measures how wrong a neural-network’s predictions are. It reduces all prediction errors to a single scalar L. Training is the process of minimising L over the dataset by adjusting the network’s weights via gradient-descent.

Role in Training

The loss is the root node of the computational-graph. backpropagation begins here:

Set L.grad = 1.0
Propagate gradients backward through every operation to every weight

Because the loss is scalar, one backward pass yields ∂L/∂w for every parameter simultaneously — a key property of reverse-mode automatic-differentiation.

Common Loss Functions

Loss	Use Case	Formula
Mean Squared Error (MSE)	Regression	`Σ(ŷ − y)² / n`
Cross-Entropy	Classification	`−Σ y log(ŷ)`
Binary Cross-Entropy	Binary classification	`−[y log(ŷ) + (1−y)log(1−ŷ)]`
Next-Token Prediction	Language modelling (pretraining)	Cross-entropy over vocabulary

Relationship to Gradient Descent

gradient-descent uses the gradients from backpropagation to update weights:

w ← w − lr × ∂L/∂w

A lower loss after the update means the network’s predictions improved. Iterating this over many batches of data is the entire training algorithm.

Sources

karpathy-2022-micrograd-backpropagation — uses loss as the starting point of backpropagation in micrograd

My Knowledge Base

Explorer

Loss Function

Loss Function

Role in Training

Common Loss Functions

Relationship to Gradient Descent

Sources

Graph View

Table of Contents

Backlinks