[0.3] Optimization & Hyperparameters

Colab: exercises | solutions

Please send any problems / bugs on the #errata channel in the Slack group, and ask any questions on the dedicated channels for this chapter of material.

If you want to change to dark mode, you can do this by clicking the three horizontal lines in the top-right, then navigating to Settings → Theme.

Links to all other chapters: (0) Fundamentals, (1) Transformer Interpretability, (2) RL.

Introduction

In today's exercises, we will explore various optimization algorithms and their roles in training deep learning models. We will delve into the inner workings of different optimization techniques such as Stochastic Gradient Descent (SGD), RMSprop, and Adam, and learn how to implement them using code. Additionally, we will discuss the concept of loss landscapes and their significance in visualizing the challenges faced during the optimization process. By the end of this set of exercises, you will have a solid understanding of optimization algorithms and their impact on model performance. We'll also take a look at Weights and Biases, a tool that can be used to track and visualize the training process, and test different values of hyperparameters to find the most effective ones.

Note - the third set of exercises in this section are on distributed training, and have different requirements: specifically, you'll need to SSH into a virtual machine which has multiple GPUs, and run the exercises from a Python file (not notebook or Colab). However you can still treat the first 2 sections as normal and then make this switch for the third section.

For a lecture on the material today, which provides some high-level understanding before you dive into the material, watch the video below:

Content & Learning Objectives

1️⃣ Optimizers

These exercises will take you through how different optimization algorithms work (specifically SGD, RMSprop and Adam). You'll write your own optimisers, and use plotting functions to visualise gradient descent on loss landscapes.

Learning Objectives
  • Understand how different optimization algorithms work
  • Translate pseudocode for these algorithms into code
  • Understand the idea of loss landscapes, and how they can visualize specific challenges in the optimization process

2️⃣ Weights and Biases

In this section, we'll look at methods for choosing hyperparameters effectively. You'll learn how to use Weights and Biases, a useful tool for hyperparameter search. By the end of today, you should be able to use Weights and Biases to train the ResNet you created in the last set of exercises.

Learning Objectives
  • Write modular, extensible code for training models
  • Learn what the most important hyperparameters are, and methods for efficiently searching over hyperparameter space
  • Learn how to use Weights & Biases for logging your runs
  • Adapt your code from yesterday to log training runs to Weights & Biases, and use this service to run hyperparameter sweeps

3️⃣ Distributed Training

In this section, we'll take you through the basics of distributed training, which is the process via which training is split over multiple separate GPUs to improve efficiency and capacity.

Learning Objectives
  • Understand the different kinds of parallelization used in deep learning (data, pipeline, tensor)
  • Understand how primitive operations in torch.distributed work, and how they come together to enable distributed training
  • Launch and benchmark your own distributed training runs, to train your implementation of ResNet34 from scratch

4️⃣ Bonus

This section gives you suggestions for further exploration of optimizers, and Weights & Biases.

Setup code

import importlib
import os
import sys
import time
from dataclasses import dataclass, replace
from pathlib import Path
from typing import Callable, Iterable, Literal

import numpy as np
import torch as t
import torch.distributed as dist
import torch.multiprocessing as mp
import torch.nn.functional as F
import wandb
from IPython.core.display import HTML
from IPython.display import display
from jaxtyping import Float, Int
from torch import Tensor, optim
from torch.utils.data import DataLoader, DistributedSampler
from torchvision import datasets, transforms
from tqdm import tqdm

# Make sure exercises are in the path
chapter = "chapter0_fundamentals"
section = "part3_optimization"
root_dir = next(p for p in Path.cwd().parents if (p / chapter).exists())
exercises_dir = root_dir / chapter / "exercises"
section_dir = exercises_dir / section
if str(exercises_dir) not in sys.path:
    sys.path.append(str(exercises_dir))

MAIN = __name__ == "__main__"

import part3_optimization.tests as tests
from part2_cnns.solutions import Linear, ResNet34, get_resnet_for_feature_extraction
from part3_optimization.utils import plot_fn, plot_fn_with_points
from plotly_utils import bar, imshow, line

device = t.device(
    "mps" if t.backends.mps.is_available() else "cuda" if t.cuda.is_available() else "cpu"
)
Help - I get a NumPy-related error

This is an annoying colab-related issue which I haven't been able to find a satisfying fix for. If you restart runtime (but don't delete runtime), and run just the imports cell above again (but not the %pip install cell), the problem should go away.