Post Content
Learn about DeepSeek R1’s innovative AI architecture from @deeplearningexplained. The course explores how R1 achieves exceptional reasoning through reinforcement learning, focusing on Group Relative Policy Optimization (GRPO) and how it improves upon traditional PPO methods. You’ll also understand KL divergence’s role in model stability, with practical code examples and clear mathematical explanations.
Support for this channel comes from our friends at Scrimba – the coding platform that’s reinvented interactive learning: https://scrimba.com/freecodecamp
Contents
(0:00:00) Introduction
(0:01:49) R1 Overview – Overview
(0:03:52) R1 Overview – DeepSeek R1-zero path
(0:05:32) R1 Overview – Reinforcement learning setup
(0:08:36) R1 Overview – Group Relative Policy Optimization (GRPO)
(0:13:04) R1 Overview – DeepSeek R1-zero result
(0:16:53) R1 Overview – Cold start supervised fine-tuning
(0:17:44) R1 Overview – Consistency reward for CoT
(0:18:35) R1 Overview – Supervised Fine tuning data generation
(0:21:06) R1 Overview – Reinforcement learning with neural reward model
(0:22:53) R1 Overview – Distillation
(0:26:16) GRPO – Overview
(0:26:55) GRPO – PPO vs GRPO
(0:30:25) GRPO – PPO formula overview
(0:33:25) GRPO – GRPO formula overview
(0:36:48) GRPO – GRPO pseudo code
(0:38:56) GRPO – GRPO Trainer code
(0:49:24) KL Divergence – Overview
(0:49:55) KL Divergence – KL Divergence in GRPO vs PPO
(0:51:20) KL Divergence – KL Divergence refresher
(0:55:32) KL Divergence – Monte Carlo estimation of KL divergence
(0:56:43) KL Divergence – Schulman blog
(0:57:38) KL Divergence – k1 = log(q/p)
(1:00:01) KL Divergence – k2 = 0.5*log(p/q)^2
(1:02:19) KL Divergence – k3 = (p/q – 1) – log(p/q)
(1:04:44) KL Divergence – benchmarking
(1:07:28) Conclusion
Thanks to our Champion and Sponsor supporters:
Drake Milly
Ulises Moralez
Goddard Tan
David MG
Matthew Springman
Claudio
Oscar R.
jedi-or-sith
Nattira Maneerat
Justin Hual
—
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news Read More freeCodeCamp.org
#programming #freecodecamp #learn #learncode #learncoding