Cumulative Sum (November 2023)
Colab: problem | solutions

Task & Dataset
The model takes in a sequence of digits from 0 to 9 (randomly sampled), and at each sequence position it outputs the cumulative sum of the digits up to and including that position, mod 10.
For example:
Input: [5, 3, 1, 7, ...]
Output: [5, 8, 9, 6, ...]
Model
The model is a 2-layer transformer with 3 attention heads in layer 0 and 2 attention heads in layer 1, and causal attention. It includes MLPs and layernorm.