Caesar Cipher (January 2024)
Colab: problem | solutions

Task & Dataset
In a Caesar cipher, all the letters in the plaintext are shifted by some fixed number of positions in the alphabet. For example, if the shift is 3, then A becomes D, B becomes E, etc.
Each sequence in the dataset is a string of characters, where the first 2 characters represent the shift key (just duplicated characters, e.g. AA for shift 0, BB for shift 1, etc), and the rest of the characters are the encoded message. The model's task is to output the decoded message.
Model
The model is a 1-layer transformer with 1 attention head, and causal attention. It includes layernorm, but no MLP layers.