Trigrams (November 2024)
Colab: problem

Task & Dataset
The dataset consists of sequences sampled from a known set of "trigrams" (3-character sequences) which appear with some probability greater than baseline. The model's task is to predict the next character given the previous characters.
More specifically:
- There are 53 tokens: 26 letters A-Z, 26 letters a-z, and a start token
- The dataset contains trigrams of the form (A, a, b) where b always follows Aa
- Sequences are mostly randomly sampled characters, except the special trigrams appear with probability ~5%
Model
The model is a 2-layer transformer with 3 attention heads per layer, and causal attention. It includes layernorm, but no MLP layers.
The model has been trained to predict the next token at each position in the sequence.
Note: Solutions for this problem are not yet available.