Cumulative Sum (November 2023)

Colab: problem | solutions

Task & Dataset

The model takes in a sequence of digits from 0 to 9 (randomly sampled), and at each sequence position it outputs the cumulative sum of the digits up to and including that position, mod 10.

For example:

Input:  [5, 3, 1, 7, ...]
Output: [5, 8, 9, 6, ...]

Model

The model is a 2-layer transformer with 3 attention heads in layer 0 and 2 attention heads in layer 1, and causal attention. It includes MLPs and layernorm.