[0.4] - Build Your Own Backpropagation Framework

Colab: exercises | solutions

Please send any problems / bugs on the #errata channel in the Slack group, and ask any questions on the dedicated channels for this chapter of material.

If you want to change to dark mode, you can do this by clicking the three horizontal lines in the top-right, then navigating to Settings → Theme.

Links to all other chapters: (0) Fundamentals, (1) Transformer Interpretability, (2) RL.

Introduction

Today you're going to build your very own system that can run the backpropagation algorithm in essentially the same way as PyTorch does. By the end of the day, you'll be able to train a multi-layer perceptron neural network, using your own backprop system!

The main differences between the full PyTorch and our version are:

We will focus on CPU only, as all the ideas are the same on GPU.
We will use NumPy arrays internally instead of ATen, the C++ array type used by PyTorch. Backpropagation works independently of the array type.
A real torch.Tensor has about 700 fields and methods. We will only implement a subset that are particularly instructional and/or necessary to train the MLP.

Note - for today, I'd lean a lot more towards being willing to read the solutions, and even move on from some of them if you don't fully understand them (especially in the first half of section 3). The low-level messy implementation details for today are much less important than the high-level conceptual takeaways.

For a lecture on the material today, which provides some high-level understanding before you dive into the material, watch the video below:

Content & Learning Objectives

1️⃣ Introduction to backprop

This takes you through what a computational graph is, and the basics of how gradients can be backpropagated through such a graph. You'll also implement the backwards versions of some basic functions: if we have tensors output = func(input), then the backward function of func can calculate the grad of input as a function of the grad of output.

Learning Objectives

Understand what a computational graph is, and how it can be used to calculate gradients.

Start to implement backwards versions of some basic functions.

2️⃣ Autograd

This section goes into more detail on the backpropagation methodology. In order to find the grad of each tensor in a computational graph, we first have to perform a topological sort of the tensors in the graph, so that each time we try to calculate tensor.grad, we've already computed all the other gradients which are used in this calculation. We end this section by writing a backprop function, which works just like the tensor.backward() method you're already used to in PyTorch.

Learning Objectives

Perform a topological sort of a computational graph (and understand why this is important).

Implement a the backprop function, to calculate and store gradients for all tensors in a computational graph.

3️⃣ Training on MNIST from scratch

In this section, we build your own equivalents of torch.nn features like nn.Parameter, nn.Module, and nn.Linear. We can then use these to build our own neural network to classify MINST data.

This completes the chain which starts at basic numpy arrays, and ends with us being able to build essentially any neural network architecture we want!

Learning Objectives

Implement more forward and backward functions, including for indexing, summing, and matrix multiplication

Learn how to build higher-level abstractions like parameters and modules on top of individual functions and tensors

Complete the process of building up a neural network from scratch and training it via gradient descent.

4️⃣ Bonus

A few bonus exercises are suggested, for pushing your understanding of backpropagation further.

Setup code

import os
import re
import sys
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Callable, Iterable, Iterator

import numpy as np
from torch.utils.data import DataLoader
from tqdm import tqdm

Arr = np.ndarray
grad_tracking_enabled = True

# Make sure exercises are in the path
chapter = "chapter0_fundamentals"
section = "part4_backprop"
root_dir = next(p for p in Path.cwd().parents if (p / chapter).exists())
exercises_dir = root_dir / chapter / "exercises"
section_dir = exercises_dir / section
if str(exercises_dir) not in sys.path:
    sys.path.append(str(exercises_dir))

MAIN = __name__ == "__main__"

import part4_backprop.tests as tests
from part4_backprop.utils import get_mnist, visualize
from plotly_utils import line