Large Language Models (LLMs)
Large language models (LLMs) are neural networks — principally transformer-based — trained on large text corpora to predict the next token given a context. Scale in parameters, data, and compute produces emergent capabilities: reasoning, code generation, question answering, and multimodal understanding.
Architecture Evolution
Model architectures evolve roughly every six months (Huang, 2026), while hardware architectures evolve every three years — creating a co-design challenge. Key architectural developments referenced in fridman-huang-2026-nvidia-ai-revolution:
- Transformers — original architecture; self-attention across all tokens
- Mixture-of-Experts (MoE) — sparse activation; only a subset of model parameters engaged per token; enables larger parameter counts at lower inference cost. Drove nvidia’s NVLink 72 design.
- SSM + Transformer hybrids — nvidia’s Nemotron 3 (120B parameters, open weights) combines transformers with state-space models (SSMs), enabling efficient sequence modelling.
- Diffusion models — NVIDIA contributed to progressive GANs and the path to diffusion, now used for image/video generation
LLMs and ai-scaling-laws
LLM capability follows power-law scaling with model size and compute. The four scaling axes (ai-scaling-laws) all apply to LLMs: pre-training, post-training refinement, test-time reasoning (chain-of-thought, search), and agentic deployment (using LLMs as the reasoning core of autonomous agents).
From Language to Physical AI
jensen-huang argues that AI is not just language: biology, chemistry, physics, weather modelling, and robotics all require domain-specific models. NVIDIA’s open-source strategy targets these modalities to ensure every industry can access frontier AI capabilities. See physical-ai and open-source-ai.
The Open-Weight Landscape (Early 2026)
nathan-lambert and sebastian-raschka document a rich open-weight ecosystem as of early 2026 (fridman-lambert-raschka-2026-state-of-ai):
| Model | Developer | Notes |
|---|---|---|
| DeepSeek V3 / R1 | deepseek | MoE; permissive licence; RLVR breakthrough |
| Qwen 2.5 series | Alibaba | 50T training tokens; various sizes |
| MiniMax / Kimi K2 Thinking | MiniMax / Moonshot | Large MoE; thinking model |
| GLM-4 | Z.ai (Zhipu AI) | Challenging DeepSeek by early 2026 |
| Mistral Large 3 | Mistral | EU-based; well-documented |
| gpt-oss-120b | OpenAI | OpenAI’s first open model since GPT-2 |
| Nemotron 3 Super | nvidia | 120B MoE; open weights + data + recipe |
| OLMo 3 | allen-institute-for-ai | Fully open data, code, and weights |
| SmolLM | HuggingFace | Small efficient models |
Chinese models dominate the large-MoE tier with permissive licences. US/EU models lead in smaller, well-documented models. The motivation for open release: gaining developer mindshare globally (especially where API security concerns block Chinese-hosted inference), enabling proprietary-data fine-tuning, and (for OpenAI) offloading inference compute to the community.
Training Pipeline
nathan-lambert describes a three-phase pipeline:
- Pre-training — Next-token prediction on trillions of tokens. Encodes most of the model’s knowledge. Synthetic data and OCR-extracted academic PDFs (arXiv, Semantic Scholar) are highest-quality sources.
- Mid-training — Same algorithm focused on specific capabilities (long context, reasoning traces). Prevents catastrophic forgetting of newly desired skills. Prepares the model for RLVR.
- Post-training — SFT → rlvr → RLHF. RLVR unlocks skills; RLHF finishes style, tone, formatting.
LLM Selection Patterns
By early 2026, users pick models based on a single memorable win, stick with them until a notable failure, then switch — analogous to browser or OS loyalty. Lambert’s personal mix: Claude Opus 4.5 for coding and philosophy; GPT-5.2 Thinking for information retrieval; Gemini for fast/search queries; Grok 4 Heavy as a debugging fallback.
See transformer-architecture for architecture details and mixture-of-experts for the dominant architectural pattern.
Sources: fridman-huang-2026-nvidia-ai-revolution | fridman-lambert-raschka-2026-state-of-ai