Nathan Lambert

Nathan Lambert is the post-training research lead at allen-institute-for-ai (AI2), the organisation behind the OLMo open-source language model family. He is a practitioner and researcher focused on how language models are trained and refined after the initial pre-training phase.

Notable Contributions

  • Co-coined “RLVR” — the term Reinforcement Learning with Verifiable Rewards emerged from AI2’s Tulu 3 post-training work, of which Lambert was a key contributor; the term was later popularised when deepseek R1 demonstrated its scaling properties
  • RLHF Book — author of a widely-referenced book on reinforcement learning from human feedback and post-training alignment techniques
  • Tulu 3 — AI2’s open post-training recipe; one of the first public demonstrations of the RLVR pipeline

Views

Lambert takes a technically grounded, sceptical view of AGI hype. He believes:

  • AI capabilities are jagged, not uniformly superhuman
  • Specialized models will dominate rather than a single AGI
  • Narrow task automation is achievable in the near term; automated AI researchers are 5–10 years away

He uses Claude Opus 4.5 as his primary model for coding and philosophy discussions, with GPT-5.2 Thinking for information retrieval and Gemini for fast queries.


Source: fridman-lambert-raschka-2026-state-of-ai