AI Factory
AI factory is jensen-huang’s term for the modern AI data centre, framed as a production facility that manufactures tokens (AI-generated outputs) rather than a warehouse that stores and retrieves pre-written files.
The Warehouse → Factory Transition
Traditional computing: store pre-written content, retrieve it with search and recommendation. Economics: storage cost is the primary value driver; retrieval is cheap.
AI computing: contextually aware, generative. Every query requires processing and generation of tokens in real time. Economics: compute is the primary cost; token production is the product.
Huang: “Warehouses don’t make much money. Factories directly correlate with the company’s revenues.” The implication is that the total addressable market for AI compute grows far faster than for traditional storage-oriented compute, because factories generate revenues, not just costs.
Token Segmentation — The iPhone Analogy
Tokens are not a single commodity. Huang predicts a tiered market:
- Free tokens — consumer-grade AI responses
- Mid-tier tokens — business applications
- Premium tokens — specialised, high-stakes inference (e.g., drug discovery, legal, financial analysis) — Huang suggests $1,000/million tokens is “just around the corner”
This mirrors smartphone economics: free apps → paid apps → enterprise software, all running on the same hardware platform. openclaw-style agentic-ai systems are the “iPhone moment” that makes the token economy legible to everyday users.
Scale and Economics
nvidia estimates:
- Token cost falls an order of magnitude per year (driven by tokens-per-watt efficiency gains)
- GPU/rack prices rise, but token generation efficiency rises faster
- 200 Vera Rubin pods per week of manufacturing capacity is needed to meet projected demand
- Each rack: 1.3–1.5 million components, 200 suppliers, 2–3 tonnes shipped pre-assembled
Huang projects GDP growth acceleration and a 100× increase in computing’s share of GDP, because compute shifts from cost centre to profit centre.
Physical Requirements
AI factories at scale imply:
- Power: gigawatt-class facilities; Huang advocates using idle grid capacity (60% headroom 99% of the time) with graceful degradation contracts rather than demanding six-nines uptime
- Cooling: conduction and convection at rack scale; radiant cooling in space for satellite-based deployments
- Supply chain: 200-supplier orchestration; supercomputer assembly now occurs in the supply chain, not the data centre, requiring gigawatts of build-and-test power in manufacturing
See rack-scale-computing for hardware architecture and cuda for the software layer.