Large Language Models (LLMs)

Large language models (LLMs) are neural networks — principally transformer-based — trained on large text corpora to predict the next token given a context. Scale in parameters, data, and compute produces emergent capabilities: reasoning, code generation, question answering, and multimodal understanding.

Architecture Evolution

Model architectures evolve roughly every six months (Huang, 2026), while hardware architectures evolve every three years — creating a co-design challenge. Key architectural developments referenced in fridman-huang-2026-nvidia-ai-revolution:

Transformers — original architecture; self-attention across all tokens
Mixture-of-Experts (MoE) — sparse activation; only a subset of model parameters engaged per token; enables larger parameter counts at lower inference cost. Drove nvidia’s NVLink 72 design.
SSM + Transformer hybrids — nvidia’s Nemotron 3 (120B parameters, open weights) combines transformers with state-space models (SSMs), enabling efficient sequence modelling.
Diffusion models — NVIDIA contributed to progressive GANs and the path to diffusion, now used for image/video generation

LLMs and ai-scaling-laws

LLM capability follows power-law scaling with model size and compute. The four scaling axes (ai-scaling-laws) all apply to LLMs: pre-training, post-training refinement, test-time reasoning (chain-of-thought, search), and agentic deployment (using LLMs as the reasoning core of autonomous agents).

From Language to Physical AI

jensen-huang argues that AI is not just language: biology, chemistry, physics, weather modelling, and robotics all require domain-specific models. NVIDIA’s open-source strategy targets these modalities to ensure every industry can access frontier AI capabilities. See physical-ai and open-source-ai.

The Open-Weight Landscape (Early 2026)

nathan-lambert and sebastian-raschka document a rich open-weight ecosystem as of early 2026 (fridman-lambert-raschka-2026-state-of-ai):

Model	Developer	Notes
DeepSeek V3 / R1	deepseek	MoE; permissive licence; RLVR breakthrough
Qwen 2.5 series	Alibaba	50T training tokens; various sizes
MiniMax / Kimi K2 Thinking	MiniMax / Moonshot	Large MoE; thinking model
GLM-4	Z.ai (Zhipu AI)	Challenging DeepSeek by early 2026
Mistral Large 3	Mistral	EU-based; well-documented
gpt-oss-120b	OpenAI	OpenAI’s first open model since GPT-2
Nemotron 3 Super	nvidia	120B MoE; open weights + data + recipe
OLMo 3	allen-institute-for-ai	Fully open data, code, and weights
SmolLM	HuggingFace	Small efficient models

Chinese models dominate the large-MoE tier with permissive licences. US/EU models lead in smaller, well-documented models. The motivation for open release: gaining developer mindshare globally (especially where API security concerns block Chinese-hosted inference), enabling proprietary-data fine-tuning, and (for OpenAI) offloading inference compute to the community.

Training Pipeline

nathan-lambert describes a three-phase pipeline:

Pre-training — Next-token prediction on trillions of tokens. Encodes most of the model’s knowledge. Synthetic data and OCR-extracted academic PDFs (arXiv, Semantic Scholar) are highest-quality sources.
Mid-training — Same algorithm focused on specific capabilities (long context, reasoning traces). Prevents catastrophic forgetting of newly desired skills. Prepares the model for RLVR.
Post-training — SFT → rlvr → RLHF. RLVR unlocks skills; RLHF finishes style, tone, formatting.

LLM Selection Patterns

By early 2026, users pick models based on a single memorable win, stick with them until a notable failure, then switch — analogous to browser or OS loyalty. Lambert’s personal mix: Claude Opus 4.5 for coding and philosophy; GPT-5.2 Thinking for information retrieval; Gemini for fast/search queries; Grok 4 Heavy as a debugging fallback.

See transformer-architecture for architecture details and mixture-of-experts for the dominant architectural pattern.

Sources: fridman-huang-2026-nvidia-ai-revolution | fridman-lambert-raschka-2026-state-of-ai

My Knowledge Base

Explorer

Large Language Models (LLMs)

Large Language Models (LLMs)

Architecture Evolution

LLMs and ai-scaling-laws

From Language to Physical AI

The Open-Weight Landscape (Early 2026)

Training Pipeline

LLM Selection Patterns

Graph View

Table of Contents

Backlinks