Rack-Scale Computing

Rack-scale computing is the design philosophy in which an entire server rack — including GPUs, CPUs, memory, networking, power delivery, and cooling — is engineered as a single co-optimised system rather than a collection of independently specified components. nvidia’s NVLink 72 and Vera Rubin architectures are the primary current examples.

Why Rack-Scale? (Amdahl’s Law)

When a workload cannot fit on one machine, it must be distributed. At that point, every subsystem becomes a bottleneck: networking, memory bandwidth, power, and storage all constrain end-to-end throughput. amdahls-law shows that if computation is only 50% of total workload time, infinite GPU speedup yields only 2× system improvement. jensen-huang calls this the core motivation for extreme co-design: optimising the entire stack simultaneously across software, chips, systems, networking, power, and cooling.

The NVLink 72 interconnect allows an entire rack of GPUs to appear as a single logical GPU, enabling a 4–10 trillion parameter mixture-of-experts (large-language-models) model to run as if on one device. The move from NVLink 8 to NVLink 72 was driven specifically by MoE architecture requirements.

The Vera Rubin pod (announced 2026) comprises:

  • 7 chip types, 5 rack types, 40 racks
  • 1,100+ Rubin GPUs, ~20,000 NVIDIA dies
  • 1.2 quadrillion transistors
  • 60 exaflops compute; 10 petabytes/second scale bandwidth
  • New storage accelerators and Vera CPU added for agentic-ai workloads (tool use and file I/O)

Each NVLink 72 rack alone: 1.3–1.5 million components, 1,300 chips, assembled in the supply chain (not the data centre).

Supply Chain Integration

Rack-scale systems are too dense for on-site data-centre assembly. NVIDIA now builds complete supercomputer racks in the supply chain and ships them 2–3 tonnes at a time. This shifts gigawatt-scale power and cooling requirements into manufacturing facilities, requiring capital investment from supply-chain partners.

See ai-factory for economic context, cuda for the software layer, and ai-scaling-laws for the demand drivers.


Source: fridman-huang-2026-nvidia-ai-revolution