Exercise Status: Mostly complete (white box section needs proof-reading, 'thought branches' bonus not done).

☆ Bonus / exploring anomalies

Here are some directions for further exploration:

On-Policy vs Off-Policy Interventions

The Thought Branches paper compares: - On-policy: Resample from the model (what we've been doing) - Off-policy: Hand-edit the text or transplant from another model

They find that off-policy interventions are less stable - the model often "notices" the edit and behaves unexpectedly.

Faithfulness Analysis

The paper also studies unfaithfulness - cases where the model's CoT doesn't reflect its actual reasoning. They use "hinted" MMLU problems where a teacher provides a (possibly wrong) hint.

Open-Ended Exploration

  • Apply these techniques to your own model/domain
  • Investigate the relationship between model size and thought anchor patterns
  • Look for receiver heads in other model architectures
  • Compare thought anchor patterns across different reasoning tasks