4-Digit Sum (September 2023)
Colab: problem | solutions

Difficulty
This problem is probably a step up in difficulty from the August problem. The solution is more involved, and may require different / more creative interpretability approaches.
Task & Dataset
The model takes in a sequence of 11 tokens: 4 digits, a + sign, 4 digits, an = sign, and then 1 token representing the first digit of the sum. The model outputs the next 4 characters of the sum (one for each of the last 4 positions). In other words, the model is trained to add up 2 4-digit numbers.
Note that the input is given in "little-endian" format (i.e. the units digit is first, then the tens digit, etc). Same for the output. This is because it makes the model easier to train.
Here is an example:
Input: 1234 + 5678 = ?
Output: ????0
Model
The model is a 3-layer transformer with 3 attention heads, and causal attention. It includes layernorm, but no MLP layers.