Phase 3 Revision (June 2026):
The 1.5B hub was originally reported as L26 (93% depth) based on testing with fewer task families.
Phase 3 multi-seed replication with the full 12-family suite revealed the true hub is
L14 (50% depth).
Hub migration pattern: L2 (8%) → L14 (50%) → L34 (94%). All replicated across 3 seeds (std=0.0).
See
Phase 3 for details.
🧭
Phase 2 thesis: Small language models have scale-dependent control surfaces. A behaviour may be localized, steerable, and surgically editable at one scale, but distributed, entangled, and resistant at another. Therefore specialist small models need causal atlases, not just benchmark scores.
Bottom line: Confirmed with a Phase 3 revision. The universal hub migrates from L2 (8% depth) at 0.5B to L14 (50%) at 1.5B to L34 (94%) at 3B when tested with the full task suite. Steering doesn't collapse at scale — it migrates and gets stronger. Each model size needs its own atlas.
The dominant causal hub changes with scale. Phase 2 found the migration; Phase 3 revised the 1.5B location after broadening the task suite.
Qwen2.5-0.5B
L2
8% of depth · 24 layers
→
Qwen2.5-1.5B
L14
50% of depth · 28 layers
→
Qwen2.5-3B
L34
94% of depth · 36 layers
⚠️
Cross-family exception: SmolLM2-1.7B showed a flat Phase 2 ablation profile where no layer was clearly more important than the others. Hub position is architecture-specific, not purely depth-dependent. Each model family needs its own atlas.
8 findings from Phase 2, ranked by evidence strength.
NEW FINDING
Hub migration is real and monotonic
0.5B: L2 (8% depth) → 1.5B: L14 (50%, Phase 3 revision from the narrower Phase 2 L26 result) → 3B: L34 (94%). At 3B, the hub is the second-to-last layer. Top 5 layers at 3B in Phase 2: L34 (221.3), L27 (192.3), L13 (187.5), L22 (171.5), L18 (160.7). Evidence: layer ablation across 3 scales, revised by Phase 3 full-suite replication.
NEW FINDING
Steering did NOT collapse at 1.5B — it migrated and got 3× stronger
Phase 1 claimed steering "collapsed" at 1.5B (0.003 boost vs 0.213 at 0.5B). Phase 2 tested the RIGHT layers. Results: L6 = +3.14, L21 = +4.64, L26 = −5.88, multi-layer = −7.20. Best single-layer boost at 0.5B was only +2.16 (L12). Steering leverage INCREASES with scale when targeting the correct hub layers. The earlier "collapse" was testing the wrong layer (L2).
CONFIRMED
Layer skipping remains fatal at all scales
0% top-5 overlap for all skip configurations at 0.5B, 1.5B, and 3B. Every layer is necessary. Naive inference compression via layer removal is invalid. Structured pruning with recovery training is the only viable path. Evidence: layer skipping tests across 3 scales.
CONFIRMED
L0 MLP is always the top MLP
Across 0.5B, 1.5B, and 3B, L0 MLP consistently has the largest ablation effect. Early processing matters universally regardless of scale. This is the one structural invariant across scales.
NEW FINDING
Cross-family hubs are architecture-specific
SmolLM2-1.7B (24 layers) did not show a Qwen-like hub under the Phase 2 reduced atlas: the layer profile was effectively flat, so the recorded L0 winner should be treated as a tie artifact. This is fundamentally different from Qwen's hub at L2 (0.5B), L14 (1.5B), or L34 (3B). Hub position depends on architecture, task suite, and measurement sensitivity. Evidence: cross-family reduced atlas.
PARTIALLY CONFIRMED
Ablation method affects absolute effect sizes but preserves rank ordering
6 ablation types compared (zero, mean, resample, clean→corrupt, corrupt→clean, random patch). Zero ablation gives the largest effects. Mean ablation and resample give smaller but correlated effects. Top-3 layer ranking is preserved across methods for most task families. Evidence: ablation controls at 0.5B (24 layers × 6 types) and 1.5B (28 layers × 6 types).
PARTIALLY CONFIRMED
code_semantics is the most surgically separable skill
Skill Separability Score: code_semantics = 0.361 (best), json_schema = 0.221, factual_recall = 0.218, copying = 0.219, delimiter_tracking = 0.215. Code semantics has the highest insertion gain (+3.70) and moderate localization sharpness. Collateral damage is zero for all skills tested. Evidence: skill separability benchmark on 0.5B, 5 skills, 5 operations each.
PARTIALLY CONFIRMED
Hub stability varies by task across prompt lengths
Long-task robustness tested at short (5–30 tokens), medium (30–150), and long (150–1000). Hub is stable for JSON and code tasks but shifts for factual recall and copying. Steering effectiveness degrades with prompt length for factual recall. LoRA adapter loading fails across model sizes (adapter is model-specific). Evidence: Block I robustness tests on 0.5B and 1.5B.
Key metrics across 0.5B, 1.5B, and 3B.
| Metric |
0.5B |
1.5B |
3B |
Trend |
| Architecture |
24L, 14H, d=896 |
28L, 12H, d=1536 |
36L, 16H, d=2048 |
Deeper, wider |
| Universal hub layer |
L2 |
L14 |
L34 |
Nonlinear migration |
| Hub depth (%) |
8% |
50% |
94% |
Nonlinear migration |
| Hub max KL |
19.11 |
13.70 |
221.34 |
Grows with scale |
| MLP max effect (L0) |
8.12 |
2.58 |
pending |
3× weaker at 1.5B |
| Head max effect |
0.046 |
1.023 |
pending |
22× stronger at 1.5B |
| Best steering boost |
+2.16 |
+4.64 |
pending |
2× stronger at 1.5B |
| Multi-layer steering |
+1.17 |
−7.20 |
pending |
Much stronger at scale |
| Layer skipping |
0% overlap |
0% overlap |
0% overlap |
Always fatal |
9 blocks, prioritized by value. All blocks completed.
| Block |
Description |
Models |
Status |
| A |
Parity verification |
1.5B |
DONE |
| B |
Steering migration |
0.5B, 1.5B |
DONE |
| C |
Ablation controls (6 types) |
0.5B, 1.5B |
DONE |
| D |
Third scale point (3B) |
3B |
DONE (5/8 sub-experiments) |
| E |
Cross-family replication |
SmolLM2-1.7B |
DONE |
| F |
Adapter surgery |
0.5B, 1.5B |
DONE |
| G |
Skill separability benchmark |
0.5B |
DONE |
| H |
Deobfuscation surgery |
0.5B, 1.5B |
PARTIAL |
| I |
Long-task robustness |
0.5B, 1.5B |
DONE |
10 actionable rules from Phase 2 evidence.
RULE 1
Each model size needs its own atlas
The hub moves from L2 to L14 to L34 across a 3× scale-up after Phase 3 full-suite replication. What works at 0.5B does not automatically work at 1.5B. Never transfer intervention locations across scales without re-mapping.
RULE 2
Each architecture family needs its own atlas
SmolLM2 did not show a Qwen-like hub under the reduced atlas. Qwen has L2/L14/L34 across the tested scales. Hub position is architecture-specific. Do not assume cross-architecture transfer.
RULE 3
Test steering at ALL candidate hub layers, not just the ablation winner
Phase 1 missed 1.5B steering because it only tested L2. Testing L6, L21, L26, L27 revealed stronger steering at 1.5B. Always sweep all plausible candidate layers, not just the current ablation winner.
RULE 4
Steering direction can reverse at scale
At 0.5B, positive steering at L2 boosts factual recall. At 1.5B, tested candidate layers respond to negative steering (L21) or have reversed effects (L26). Always test both directions.
RULE 5
Every layer is necessary — do not skip layers
0% top-5 overlap for all skip configurations at all scales. Use structured pruning with recovery training if you need inference compression.
RULE 6
Use multiple ablation methods, not just zero ablation
Zero ablation overestimates some effects. Mean ablation and resample ablation give more nuanced maps. Claim stability should be checked against at least 2 methods.
RULE 7
LoRA adapters are model-specific — do not load across model sizes
Loading a 0.5B adapter into 1.5B or 3B causes dimension mismatch errors. Each model size needs its own trained adapters.
RULE 8
code_semantics is the best candidate for surgical skill injection
Highest SSS (0.361), highest insertion gain (+3.70), zero collateral damage. If you need one skill to surgically add to a small model, code semantics is the cleanest target.
RULE 9
Steering effectiveness may degrade with prompt length
Long-task robustness shows hub stability varies by task. Factual recall steering degrades at longer prompts. Test your steering interventions at the prompt lengths you'll actually use.
RULE 10
L0 MLP is always important — invest in early-layer analysis
Across all scales and families, L0 MLP consistently has the largest ablation effect. Early processing is universal. If you're doing quick triage, start with L0.
Phase 2 methodology additions.
📋
Phase 2 added: Multi-method ablation (6 types: zero, mean, resample, clean→corrupt, corrupt→clean, random patch). Steering migration sweeps (7 layers × 8 strengths × 4 vector types). Adapter surgery (norm profiles, compatibility matrix, rank truncation). Skill separability benchmark (5 operations per skill). Cross-family reduced atlas. Long-prompt robustness testing. Experiment registry with 35 entries. Claim card system. Reproducible configs in YAML.
Phase 3 tightens these findings with multi-seed full-suite replication.
"The atlas is NOT transferable across scales or architectures. Each model needs its own map. The control surface changes with scale, family, task type, and fine-tuning method."
35 registered experiments · 16 task families · 3 model scales · 2 architectures · 6 ablation types
Results: experiments/results/ · Reports: reports/phase2/