Rack-Scale Computing
Rack-scale computing is the design philosophy in which an entire server rack — including GPUs, CPUs, memory, networking, power delivery, and cooling — is engineered as a single co-optimised system rather than a collection of independently specified components. nvidia’s NVLink 72 and Vera Rubin architectures are the primary current examples.
Why Rack-Scale? (Amdahl’s Law)
When a workload cannot fit on one machine, it must be distributed. At that point, every subsystem becomes a bottleneck: networking, memory bandwidth, power, and storage all constrain end-to-end throughput. amdahls-law shows that if computation is only 50% of total workload time, infinite GPU speedup yields only 2× system improvement. jensen-huang calls this the core motivation for extreme co-design: optimising the entire stack simultaneously across software, chips, systems, networking, power, and cooling.
NVLink 72 and the Vera Rubin Pod
The NVLink 72 interconnect allows an entire rack of GPUs to appear as a single logical GPU, enabling a 4–10 trillion parameter mixture-of-experts (large-language-models) model to run as if on one device. The move from NVLink 8 to NVLink 72 was driven specifically by MoE architecture requirements.
The Vera Rubin pod (announced 2026) comprises:
- 7 chip types, 5 rack types, 40 racks
- 1,100+ Rubin GPUs, ~20,000 NVIDIA dies
- 1.2 quadrillion transistors
- 60 exaflops compute; 10 petabytes/second scale bandwidth
- New storage accelerators and Vera CPU added for agentic-ai workloads (tool use and file I/O)
Each NVLink 72 rack alone: 1.3–1.5 million components, 1,300 chips, assembled in the supply chain (not the data centre).
Supply Chain Integration
Rack-scale systems are too dense for on-site data-centre assembly. NVIDIA now builds complete supercomputer racks in the supply chain and ships them 2–3 tonnes at a time. This shifts gigawatt-scale power and cooling requirements into manufacturing facilities, requiring capital investment from supply-chain partners.
See ai-factory for economic context, cuda for the software layer, and ai-scaling-laws for the demand drivers.