Chapter 2: Reinforcement Learning

Take a whirlwind tour through RL, starting from tabular learning and Atari, and ending with some of the cutting-edge techniques used in current LLM post-training.

Sections

2.1 Intro to RL RL fundamentals: MDPs, policies, value functions, and multi-armed bandits.

2.2 DQN & VPG Implement DQN and Vanilla Policy Gradient for CartPole and beyond.

2.3 PPO Build a PPO agent from scratch and train it to master CartPole.

2.4 RLHF Implement RLHF end-to-end, applying PPO to language model finetuning.

Sections

File type

Markdown (full) Markdown (without solutions) Python Paper(s)

Visit a chapter section to download content

Model:

Ask questions about the exercises...

Web Analytics Made Easy - Statcounter

Web Analytics Made Easy - Statcounter