Sorted List (October 2023)

Colab: problem | solutions

Task & Dataset

Each sequence in the dataset consists of:

  • A start token [
  • A sequence of 10 numbers (this will include repeats, if d_vocab < 10)
  • An end token ]

The model is trained to output 1 (i.e. "sorted") at the position after the end token if the 10 numbers are in ascending order, and 0 (i.e. "unsorted") otherwise.

Model

The model is a 1-layer transformer with 3 attention heads, and causal attention. It includes layernorm, but no MLP layers.