Source Summary: Fridman × Lambert × Raschka (2026) — State of AI

Source: Lex Fridman Podcast #490 Guests: Nathan Lambert (post-training lead, AI2; RLHF book author) & Sebastian Raschka (ML researcher; Build a LLM From Scratch author) Host: lex-fridman Published: 2026-01-31 Format: Transcript (~4 hours, 26 chapters) Raw file: raw/articles/Transcript for State of AI in 2026 LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI Lex Fridman Podcast 490.md

Overview

Technical deep-dive into the AI landscape as of early 2026, covering the competitive dynamics between US and Chinese labs, the evolution of large-language-models, the status of ai-scaling-laws, post-training techniques (especially rlvr), artificial-general-intelligence timelines, and the future of programming. More technically grounded than fridman-huang-2026-nvidia-ai-revolution; focuses on model training and research rather than infrastructure.

China vs US AI Race

deepseek’s R1 (January 2025) was the defining geopolitical AI moment of 2025 — near-SOTA performance at a fraction of claimed cost, triggering a wave of Chinese open-weight model releases. By early 2026, DeepSeek is losing its crown to Z.ai (GLM models), MiniMax, and Kimi K2 Thinking (Moonshot). Lambert and Raschka’s assessment: no winner-takes-all scenario; differentiation comes from budget and hardware, not proprietary ideas, since researchers rotate between labs frequently.

Chinese models trend toward large open-weight MoEs with permissive licences (unrestricted vs. Llama’s user-cap terms), making them attractive for enterprise fine-tuning. US frontier labs (OpenAI, anthropic, Google DeepMind) remain ahead on output quality; users pay for the margin.

Claude Opus 4.5 (Anthropic) dominated the “AI Twitter” hype cycle around coding. openai GPT-5 saved costs via a routing architecture (most queries sent to cheaper sub-models). Gemini 3 (Google) positioned as the long-context leader before GPT-5.2 released a notable update.

LLM Landscape: Who’s Winning

Model	Strength	Lambert’s Usage
Claude Opus 4.5	Code, reasoning, voice	Primary for coding + philosophy
GPT-5.2 Thinking	Information retrieval, long context	5 simultaneous pro queries
Gemini	Fast, long-context, search integration	Fast queries
Grok 4 Heavy	Hardcore debugging	Debugging fallback

Key insight: users pick models based on a single memorable win, stick with them until a failure, then switch — analogous to browser or OS loyalty.

Open vs Closed LLMs

Strong open-weight ecosystem by early 2026: DeepSeek, Qwen (Alibaba), MiniMax, Kimi, Mistral Large 3, gpt-oss-120b (OpenAI’s first open model since GPT-2), nvidia Nemotron 3, OLMo (AI2), LM360, Apertus, SmolLM (HuggingFace). Chinese models dominate the large-MoE tier; US/EU models lead in smaller well-documented models.

Motivations for open release: gaining developer mindshare globally (especially where API security concerns block Chinese-hosted inference); enabling fine-tuning on proprietary domain data; and for OpenAI, offloading inference compute costs to the community.

Legal issue: anthropic lost a $1.5B lawsuit for torrenting (not just purchasing) books for training data — landmark case for training data licensing.

ai-scaling-laws — Status Check

Lambert’s three-axis framework: (1) pre-training (compute + data); (2) RL training (RLVR); (3) inference-time scaling. All three are still working but with different rates of low-hanging fruit.

Pre-training: Still improves models; cost is shifting from training to inference (serving 100M+ users dwarfs the $2–5M training run). Biggest pre-training clusters (gigawatt-scale) coming online in 2026 from xAI and others. Data quality, not volume, is now the binding constraint.

Post-training (RLVR): The most exciting current frontier. Scales with a log-linear relationship (10x compute → linear eval improvement). Enables inference-time scaling. Grok 4 spent comparable compute on RL as on pre-training.

Inference-time scaling: Already commercialised (o1, Claude Opus 4.5 extended thinking); generates “hidden thoughts” before the first output token. Enables tool use, software engineering, debugging.

rlvr — The Key 2025 Technique

RLVR (Reinforcement Learning with Verifiable Rewards): term coined by AI2’s Tulu 3 team, popularised by deepseek R1 which demonstrated its scaling properties. The model generates answers to verifiable problems (math, code), gets a binary correct/incorrect reward, and takes RL gradient updates. No human labelling required.

Key properties:

Scales indefinitely (unlike reinforcement-learning-from-human-feedback which plateaus due to reward model over-optimisation)
Unlocks inference-time scaling: models trained with RLVR naturally generate longer chain-of-thought reasoning
Observed “aha moment”: models learn to self-correct mid-reasoning (e.g., “I made an error, let me retry”)
Works even when intermediate reasoning steps are incorrect — the grading of final answers is sufficient

RLVR 2.0 prediction: process reward models (grading intermediate steps, not just answers) and expansion into open-ended scientific domains. Value functions (from deep RL) are the next candidate after process reward models showed headaches.

Transformer Architecture — How Much Has Changed?

Raschka’s verdict: fundamentally unchanged from GPT-2. The architecture is still autoregressive decoder-only transformer with attention + FFN blocks. The innovations are tweaks:

Innovation	What it is
mixture-of-experts (MoE)	Sparse FFN activation; router sends tokens to subset of “experts”; larger capacity without proportional compute
Group Query Attention (GQA)	Reduces KV cache size; cheaper long-context
Multi-Head Latent Attention (MLA)	DeepSeek’s KV cache compression technique
RMSNorm	Replaces LayerNorm; marginal improvement
SwiGLU activation	Nonlinearity tweak
Sliding Window Attention	Local attention window; OLMo 3

Alternative architectures being explored: text-diffusion-models (parallel generation, potentially faster; deployed in code-diff startups), Mamba/SSM hybrids (fixed-state RNN-like; cheaper long-context but lossy).

Training Pipeline: Pre → Mid → Post

Pre-training: Next-token prediction on trillions of tokens (Qwen reportedly 50T, rumoured frontier labs up to 100T). Most of the model’s knowledge is encoded here. Synthetic data is valuable for quality, not just volume; OCR-extracted PDFs (arXiv, Semantic Scholar) are high-quality sources.
Mid-training: Same algorithm but focused on specific capabilities (long context, reasoning traces). Prevents catastrophic forgetting of newly desired skills. Enables RLVR to work by providing relevant trace data.
Post-training: SFT → RLVR → RLHF. RLVR unlocks skills; RLHF finishes style, tone, and formatting. RLHF does not scale like RLVR and reaches diminishing returns quickly.

AGI Timeline & Future of Programming

Lambert and Raschka reject clean AGI definitions. Key positions:

AI capabilities are jagged: superhuman at some code types, poor at distributed ML systems and safety-critical engineering
The “AI 2027” report (which predicted a superhuman coder by 2027–28, then revised to 2031 mean) is praised for concrete milestones but its singularity assumptions are disputed
Lambert: software automation of narrow tasks this year; automated AI researcher > 5–10 years
Raschka: AGI dream is “amplification not paradigm change” — continuous improvement, not a step function
Lambert: “the dream of AGI” (one model solving everything) is “kind of dying” — specialized models will dominate

Programming future: senior developers already ship 50%+ AI-generated code. Role shifts toward system design, specification, and product management. From-scratch website generation is nearly solved; complex production codebases (Chrome-level) remain hard.

Entities Mentioned

nathan-lambert — Post-training lead, AI2; co-coined “RLVR”; RLHF book author
sebastian-raschka — ML researcher; Build a LLM From Scratch (2024); independent educator
lex-fridman — Host
deepseek — Chinese open-weight lab; R1 and V3 models; RLVR scaling breakthrough
anthropic — Claude Opus 4.5; coding-focused culture; $1.5B training data lawsuit
openai — GPT-5, GPT-5.2, gpt-oss-120b; o1 reasoning model; chaotic but research-innovative
allen-institute-for-ai (AI2) — OLMo models; open data + code; Nathan Lambert’s employer
nvidia — Nemotron 3; GPU infrastructure for training; FP8/FP4 system optimizations

Assessment

Credibility: High — Lambert and Raschka are working researchers with hands-on training experience. Bias: Academic/open-source perspective; sceptical of AGI hype; bullish on RLVR. Utility: Best source in this knowledge base for post-training techniques, LLM architecture details, and the open-weight model landscape.

My Knowledge Base

Explorer

Lex Fridman Podcast #490: State of AI in 2026 — LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI