CUDA
CUDA (Compute Unified Device Architecture) is nvidia’s parallel computing platform and API, launched in 2006–2007. It enables developers to write general-purpose code (originally C, later C++, Python, and many others) that runs on NVIDIA GPUs. CUDA is now at version 13.2 and underpins the entire modern AI infrastructure stack.
Strategic History
The decision to ship CUDA on consumer GeForce gaming GPUs was, by jensen-huang’s account, the most consequential and most painful strategic choice in NVIDIA’s history:
- Adding CUDA to GeForce increased chip cost by ~50%, consuming all of the company’s gross-margin dollars at the time
- NVIDIA’s market cap fell from ~1.5 billion; the company remained at low valuations for years
- The rationale: developers go where the install base is; GeForce had millions of units in homes, university labs, and research clusters
- NVIDIA simultaneously went to universities, wrote textbooks, and taught courses to cultivate developers
The result: researchers and scientists discovered CUDA on their gaming PCs, ported scientific workloads, and established the software foundation that enabled the deep-learning revolution when AlexNet (2012) and successors demonstrated GPU-accelerated neural networks at scale.
The Moat: Install Base + Ecosystem + Trust
Huang is explicit that CUDA’s moat is not primarily technical:
“It’s never been just about the technology. It was the fact that the company was dedicated to it, stuck with it, expanded its reach. It wasn’t three people that made CUDA successful. It was 43,000 people that made CUDA successful.”
Three interlocking moat components:
- Install base — CUDA runs on hundreds of millions of GPUs across every cloud (AWS, Azure, GCP), every industry, every country; developers target CUDA to reach the maximum audience
- Velocity — NVIDIA ships a new generation roughly every year; a developer who invests in CUDA today can expect 10× better performance in 6 months; no competing platform offers this rate of improvement
- Trust — Developers believe NVIDIA will maintain and improve CUDA indefinitely; they can “take it to the bank”
A hypothetical competing platform (“GUDA” or “TUDA”) faces a cold-start problem: without install base, velocity track record, and trust, it cannot attract developers regardless of architectural elegance. Huang cites x86 vs. RISC as historical precedent: x86’s architectural inelegance did not prevent dominance because install base and ecosystem trumped technical merit.
Flexibility as a Design Virtue
CUDA’s balance of specialisation (GPU acceleration) and generality (programmable, adaptable) is why it has survived rapid algorithmic change. When mixture-of-experts (MoE) architectures emerged, CUDA-based systems were adapted; when agentic workloads emerged, CUDA-enabled racks were redesigned. Huang: “The reason CUDA has been so resilient is that incredible balance between specialisation and generalisation.”
CUDA in the AI Factory Era
The “unit of CUDA compute” has expanded over time: chip → computer → cluster → ai-factory. NVIDIA still ships CUDA but the mental model is now a gigawatt AI factory, not a single GPU. CUDA is the programming layer that spans this entire hierarchy — from individual GPU kernels to distributed training runs across thousands of nodes. See rack-scale-computing.