Source Summary: Fridman × Lambert × Raschka (2026) — State of AI
Source: Lex Fridman Podcast #490
Guests: Nathan Lambert (post-training lead, AI2; RLHF book author) & Sebastian Raschka (ML researcher; Build a LLM From Scratch author)
Host: lex-fridman
Published: 2026-01-31
Format: Transcript (~4 hours, 26 chapters)
Raw file: raw/articles/Transcript for State of AI in 2026 LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI Lex Fridman Podcast 490.md
Overview
Technical deep-dive into the AI landscape as of early 2026, covering the competitive dynamics between US and Chinese labs, the evolution of large-language-models, the status of ai-scaling-laws, post-training techniques (especially rlvr), artificial-general-intelligence timelines, and the future of programming. More technically grounded than fridman-huang-2026-nvidia-ai-revolution; focuses on model training and research rather than infrastructure.
China vs US AI Race
deepseek’s R1 (January 2025) was the defining geopolitical AI moment of 2025 — near-SOTA performance at a fraction of claimed cost, triggering a wave of Chinese open-weight model releases. By early 2026, DeepSeek is losing its crown to Z.ai (GLM models), MiniMax, and Kimi K2 Thinking (Moonshot). Lambert and Raschka’s assessment: no winner-takes-all scenario; differentiation comes from budget and hardware, not proprietary ideas, since researchers rotate between labs frequently.
Chinese models trend toward large open-weight MoEs with permissive licences (unrestricted vs. Llama’s user-cap terms), making them attractive for enterprise fine-tuning. US frontier labs (OpenAI, anthropic, Google DeepMind) remain ahead on output quality; users pay for the margin.
Claude Opus 4.5 (Anthropic) dominated the “AI Twitter” hype cycle around coding. openai GPT-5 saved costs via a routing architecture (most queries sent to cheaper sub-models). Gemini 3 (Google) positioned as the long-context leader before GPT-5.2 released a notable update.
LLM Landscape: Who’s Winning
| Model | Strength | Lambert’s Usage |
|---|---|---|
| Claude Opus 4.5 | Code, reasoning, voice | Primary for coding + philosophy |
| GPT-5.2 Thinking | Information retrieval, long context | 5 simultaneous pro queries |
| Gemini | Fast, long-context, search integration | Fast queries |
| Grok 4 Heavy | Hardcore debugging | Debugging fallback |
Key insight: users pick models based on a single memorable win, stick with them until a failure, then switch — analogous to browser or OS loyalty.
Open vs Closed LLMs
Strong open-weight ecosystem by early 2026: DeepSeek, Qwen (Alibaba), MiniMax, Kimi, Mistral Large 3, gpt-oss-120b (OpenAI’s first open model since GPT-2), nvidia Nemotron 3, OLMo (AI2), LM360, Apertus, SmolLM (HuggingFace). Chinese models dominate the large-MoE tier; US/EU models lead in smaller well-documented models.
Motivations for open release: gaining developer mindshare globally (especially where API security concerns block Chinese-hosted inference); enabling fine-tuning on proprietary domain data; and for OpenAI, offloading inference compute costs to the community.
Legal issue: anthropic lost a $1.5B lawsuit for torrenting (not just purchasing) books for training data — landmark case for training data licensing.
ai-scaling-laws — Status Check
Lambert’s three-axis framework: (1) pre-training (compute + data); (2) RL training (RLVR); (3) inference-time scaling. All three are still working but with different rates of low-hanging fruit.
Pre-training: Still improves models; cost is shifting from training to inference (serving 100M+ users dwarfs the $2–5M training run). Biggest pre-training clusters (gigawatt-scale) coming online in 2026 from xAI and others. Data quality, not volume, is now the binding constraint.
Post-training (RLVR): The most exciting current frontier. Scales with a log-linear relationship (10x compute → linear eval improvement). Enables inference-time scaling. Grok 4 spent comparable compute on RL as on pre-training.
Inference-time scaling: Already commercialised (o1, Claude Opus 4.5 extended thinking); generates “hidden thoughts” before the first output token. Enables tool use, software engineering, debugging.
rlvr — The Key 2025 Technique
RLVR (Reinforcement Learning with Verifiable Rewards): term coined by AI2’s Tulu 3 team, popularised by deepseek R1 which demonstrated its scaling properties. The model generates answers to verifiable problems (math, code), gets a binary correct/incorrect reward, and takes RL gradient updates. No human labelling required.
Key properties:
- Scales indefinitely (unlike reinforcement-learning-from-human-feedback which plateaus due to reward model over-optimisation)
- Unlocks inference-time scaling: models trained with RLVR naturally generate longer chain-of-thought reasoning
- Observed “aha moment”: models learn to self-correct mid-reasoning (e.g., “I made an error, let me retry”)
- Works even when intermediate reasoning steps are incorrect — the grading of final answers is sufficient
RLVR 2.0 prediction: process reward models (grading intermediate steps, not just answers) and expansion into open-ended scientific domains. Value functions (from deep RL) are the next candidate after process reward models showed headaches.
Transformer Architecture — How Much Has Changed?
Raschka’s verdict: fundamentally unchanged from GPT-2. The architecture is still autoregressive decoder-only transformer with attention + FFN blocks. The innovations are tweaks:
| Innovation | What it is |
|---|---|
| mixture-of-experts (MoE) | Sparse FFN activation; router sends tokens to subset of “experts”; larger capacity without proportional compute |
| Group Query Attention (GQA) | Reduces KV cache size; cheaper long-context |
| Multi-Head Latent Attention (MLA) | DeepSeek’s KV cache compression technique |
| RMSNorm | Replaces LayerNorm; marginal improvement |
| SwiGLU activation | Nonlinearity tweak |
| Sliding Window Attention | Local attention window; OLMo 3 |
Alternative architectures being explored: text-diffusion-models (parallel generation, potentially faster; deployed in code-diff startups), Mamba/SSM hybrids (fixed-state RNN-like; cheaper long-context but lossy).
Training Pipeline: Pre → Mid → Post
- Pre-training: Next-token prediction on trillions of tokens (Qwen reportedly 50T, rumoured frontier labs up to 100T). Most of the model’s knowledge is encoded here. Synthetic data is valuable for quality, not just volume; OCR-extracted PDFs (arXiv, Semantic Scholar) are high-quality sources.
- Mid-training: Same algorithm but focused on specific capabilities (long context, reasoning traces). Prevents catastrophic forgetting of newly desired skills. Enables RLVR to work by providing relevant trace data.
- Post-training: SFT → RLVR → RLHF. RLVR unlocks skills; RLHF finishes style, tone, and formatting. RLHF does not scale like RLVR and reaches diminishing returns quickly.
AGI Timeline & Future of Programming
Lambert and Raschka reject clean AGI definitions. Key positions:
- AI capabilities are jagged: superhuman at some code types, poor at distributed ML systems and safety-critical engineering
- The “AI 2027” report (which predicted a superhuman coder by 2027–28, then revised to 2031 mean) is praised for concrete milestones but its singularity assumptions are disputed
- Lambert: software automation of narrow tasks this year; automated AI researcher > 5–10 years
- Raschka: AGI dream is “amplification not paradigm change” — continuous improvement, not a step function
- Lambert: “the dream of AGI” (one model solving everything) is “kind of dying” — specialized models will dominate
Programming future: senior developers already ship 50%+ AI-generated code. Role shifts toward system design, specification, and product management. From-scratch website generation is nearly solved; complex production codebases (Chrome-level) remain hard.
Entities Mentioned
- nathan-lambert — Post-training lead, AI2; co-coined “RLVR”; RLHF book author
- sebastian-raschka — ML researcher; Build a LLM From Scratch (2024); independent educator
- lex-fridman — Host
- deepseek — Chinese open-weight lab; R1 and V3 models; RLVR scaling breakthrough
- anthropic — Claude Opus 4.5; coding-focused culture; $1.5B training data lawsuit
- openai — GPT-5, GPT-5.2, gpt-oss-120b; o1 reasoning model; chaotic but research-innovative
- allen-institute-for-ai (AI2) — OLMo models; open data + code; Nathan Lambert’s employer
- nvidia — Nemotron 3; GPU infrastructure for training; FP8/FP4 system optimizations
Assessment
Credibility: High — Lambert and Raschka are working researchers with hands-on training experience. Bias: Academic/open-source perspective; sceptical of AGI hype; bullish on RLVR. Utility: Best source in this knowledge base for post-training techniques, LLM architecture details, and the open-weight model landscape.