Chapter 1: Transformer Interpretability

Dive deep into language model interpretability, from linear probes and SAEs to circuit analysis and toy models.