Circuits in LLMs

This section focuses on analyzing circuits in large language models. You'll learn to identify and understand the computational subgraphs responsible for specific model behaviours, and develop skills for designing rigorous experiments in messy, real-world models.

What You'll Learn

  • Circuit Analysis: Identify the attention heads, MLPs, and pathways responsible for specific behaviours
  • Causal Interventions: Use techniques like activation patching, ablation, and path patching to establish causal relationships
  • Experimental Design: Learn to narrow down large hypothesis spaces and isolate the phenomena you want to study

Sections

The exercises in this section are:

  • 1.4.1 Indirect Object Identification: Reverse-engineer the circuit in GPT-2 that completes sentences like "John and Mary went to the shops, John gave a bag to" with "Mary"
  • 1.4.2 SAE Circuits: Combine circuit analysis with sparse autoencoders to understand model computations at the feature level

Key Challenges

Working with production-scale language models presents unique challenges. Unlike toy models, you'll need to:

  • Filter signal from noise in models with billions of parameters
  • Design experiments that control for confounding variables
  • Build intuition for which hypotheses are worth testing

Prerequisites

You should have completed sections 1.1 and 1.2 before starting this material. Familiarity with section 1.3 (particularly SAEs) is helpful for 1.4.2.