[0.4] - Build Your Own Backpropagation Framework
Please send any problems / bugs on the #errata channel in the Slack group, and ask any questions on the dedicated channels for this chapter of material.
If you want to change to dark mode, you can do this by clicking the three horizontal lines in the top-right, then navigating to Settings → Theme.
Links to all other chapters: (0) Fundamentals, (1) Transformer Interpretability, (2) RL.

Introduction
Today you're going to build your very own system that can run the backpropagation algorithm in essentially the same way as PyTorch does. By the end of the day, you'll be able to train a multi-layer perceptron neural network, using your own backprop system!
The main differences between the full PyTorch and our version are:
- We will focus on CPU only, as all the ideas are the same on GPU.
- We will use NumPy arrays internally instead of ATen, the C++ array type used by PyTorch. Backpropagation works independently of the array type.
- A real
torch.Tensorhas about 700 fields and methods. We will only implement a subset that are particularly instructional and/or necessary to train the MLP.
Note - for today, I'd lean a lot more towards being willing to read the solutions, and even move on from some of them if you don't fully understand them (especially in the first half of section 3). The low-level messy implementation details for today are much less important than the high-level conceptual takeaways.
For a lecture on the material today, which provides some high-level understanding before you dive into the material, watch the video below:
Content & Learning Objectives
1️⃣ Introduction to backprop
This takes you through what a computational graph is, and the basics of how gradients can be backpropagated through such a graph. You'll also implement the backwards versions of some basic functions: if we have tensors output = func(input), then the backward function of func can calculate the grad of input as a function of the grad of output.
Learning Objectives
- Understand what a computational graph is, and how it can be used to calculate gradients.
- Start to implement backwards versions of some basic functions.
2️⃣ Autograd
This section goes into more detail on the backpropagation methodology. In order to find the grad of each tensor in a computational graph, we first have to perform a topological sort of the tensors in the graph, so that each time we try to calculate tensor.grad, we've already computed all the other gradients which are used in this calculation. We end this section by writing a backprop function, which works just like the tensor.backward() method you're already used to in PyTorch.
Learning Objectives
- Perform a topological sort of a computational graph (and understand why this is important).
- Implement a the
backpropfunction, to calculate and store gradients for all tensors in a computational graph.
3️⃣ Training on MNIST from scratch
In this section, we build your own equivalents of torch.nn features like nn.Parameter, nn.Module, and nn.Linear. We can then use these to build our own neural network to classify MINST data.
This completes the chain which starts at basic numpy arrays, and ends with us being able to build essentially any neural network architecture we want!
Learning Objectives
- Implement more forward and backward functions, including for indexing, summing, and matrix multiplication
- Learn how to build higher-level abstractions like parameters and modules on top of individual functions and tensors
- Complete the process of building up a neural network from scratch and training it via gradient descent.
4️⃣ Bonus
A few bonus exercises are suggested, for pushing your understanding of backpropagation further.
Setup code
import os
import re
import sys
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Callable, Iterable, Iterator
import numpy as np
from torch.utils.data import DataLoader
from tqdm import tqdm
Arr = np.ndarray
grad_tracking_enabled = True
# Make sure exercises are in the path
chapter = "chapter0_fundamentals"
section = "part4_backprop"
root_dir = next(p for p in Path.cwd().parents if (p / chapter).exists())
exercises_dir = root_dir / chapter / "exercises"
section_dir = exercises_dir / section
if str(exercises_dir) not in sys.path:
sys.path.append(str(exercises_dir))
MAIN = __name__ == "__main__"
import part4_backprop.tests as tests
from part4_backprop.utils import get_mnist, visualize
from plotly_utils import line