Sorted List (October 2023)
Colab: problem | solutions

Task & Dataset
Each sequence in the dataset consists of:
- A start token
[ - A sequence of 10 numbers (this will include repeats, if
d_vocab < 10) - An end token
]
The model is trained to output 1 (i.e. "sorted") at the position after the end token if the 10 numbers are in ascending order, and 0 (i.e. "unsorted") otherwise.
Model
The model is a 1-layer transformer with 3 attention heads, and causal attention. It includes layernorm, but no MLP layers.