☆ Bonus / exploring anomalies
Here are some directions for further exploration:
On-Policy vs Off-Policy Interventions
The Thought Branches paper compares: - On-policy: Resample from the model (what we've been doing) - Off-policy: Hand-edit the text or transplant from another model
They find that off-policy interventions are less stable - the model often "notices" the edit and behaves unexpectedly.
Faithfulness Analysis
The paper also studies unfaithfulness - cases where the model's CoT doesn't reflect its actual reasoning. They use "hinted" MMLU problems where a teacher provides a (possibly wrong) hint.
Open-Ended Exploration
- Apply these techniques to your own model/domain
- Investigate the relationship between model size and thought anchor patterns
- Look for receiver heads in other model architectures
- Compare thought anchor patterns across different reasoning tasks