Nathan Lambert
Nathan Lambert is the post-training research lead at allen-institute-for-ai (AI2), the organisation behind the OLMo open-source language model family. He is a practitioner and researcher focused on how language models are trained and refined after the initial pre-training phase.
Notable Contributions
- Co-coined “RLVR” — the term Reinforcement Learning with Verifiable Rewards emerged from AI2’s Tulu 3 post-training work, of which Lambert was a key contributor; the term was later popularised when deepseek R1 demonstrated its scaling properties
- RLHF Book — author of a widely-referenced book on reinforcement learning from human feedback and post-training alignment techniques
- Tulu 3 — AI2’s open post-training recipe; one of the first public demonstrations of the RLVR pipeline
Views
Lambert takes a technically grounded, sceptical view of AGI hype. He believes:
- AI capabilities are jagged, not uniformly superhuman
- Specialized models will dominate rather than a single AGI
- Narrow task automation is achievable in the near term; automated AI researchers are 5–10 years away
He uses Claude Opus 4.5 as his primary model for coding and philosophy discussions, with GPT-5.2 Thinking for information retrieval and Gemini for fast queries.