We trained LoRA adapters on Qwen2.5-0.5B using three strategies — atlas-guided (targeting layers identified by our causal atlas), random-matched (same number of random layers), and all-linear (standard practice) — across three task families.
| Strategy | Params | JSON Loss | JSON Exact | Factual Loss | Code Loss |
|---|---|---|---|---|---|
| Atlas-guided | 319K (0.065%) | 0.019 | 1.000 | 0.006 | 0.034 |
| Random-matched | 319K (0.065%) | 0.062 | 1.000 | 0.043 | 0.056 |
| All-linear | 4.4M (0.88%) | 0.007 | 1.000 | 0.007 | 0.000 |
CONFIRMED This validates the core MI-Atlas methodology: map the causal surface first, then train against it.
The universal processing hub migrates from early to late layers as model size increases. Phase 3 replicated this across 3 scales with 3 seeds each (all std=0.0), using the full 12-family task suite (4300 examples).
| Model | Hub Layer | Depth | Seeds | Std | Status |
|---|---|---|---|---|---|
| Qwen2.5-0.5B | L2 | 8% | 42, 137, 256 | 0.0 | Replicated |
| Qwen2.5-1.5B | L14 | 50% | 42, 137, 256 | 0.0 | Revised |
| Qwen2.5-3B | L34 | 94% | 42, 137, 256 | 0.0 | Replicated |
| Qwen2.5-Coder-0.5B | L22 | 92% | 1 | — | New |
| SmolLM2-1.7B | L0 | 0% | 1 | — | Pilot |
NEW GEM Hub location depends on task suite breadth. Always use the widest available suite.
2500 natural language prompts (50+ per family) give the same hub as synthetic prompts. L2 confirmed at 100% family agreement.
ValidatedSame architecture (Qwen2.5), same scale (0.5B), but code training moves hub from L2 (8%) to L22 (92%). Architecture > scale.
New4-bit NF4 gives KL=10.0 at s=-4.0 vs bf16 KL=0.021. That's 476x amplification. Quantized models are more sensitive to interventions.
NewAnswer tokens dominate at ALL 24 layers (mean effect 8.65 vs BOS 1.57). Ablation-based hub identification is biased toward output layers.
NewModule sweep: o_proj achieves loss=0.33 with 344K params. 10x more efficient than MLP (loss=0.45, 1.1M params).
ConfirmedResidual stream activations have near-zero mean. Mean ablation gives all-zero effects. Zero ablation is valid — not "more destructive."
Confirmed| Rule | Evidence | Confidence |
|---|---|---|
| Each model needs its own atlas | Hub at L2/L14/L34/L22/L0 across 5 models | HIGH |
| Target late layers for LoRA | Adapter effects at final ~10% across 3 scales | HIGH |
| o_proj is the most efficient module | 344K params, loss=0.33 vs MLP 1.1M, loss=0.45 | HIGH |
| Use atlas-guided layer targeting | 13.8x fewer params, equal accuracy on JSON | HIGH |
| Use 4-bit NF4, not 8-bit | 8-bit 52% slower, 4-bit only 9% slower | HIGH |
| All layers are necessary | 0% top-5 overlap for all skip configs | HIGH |
| Test steering at ALL candidate layers | Phase 1 missed L21/L26 at 1.5B by only testing L2 | HIGH |
| Hub location depends on task suite breadth | L26→L14 at 1.5B when expanding from 4 to 12 families | NEW |
| Component | Detail |
|---|---|
| Scripts | 16 Phase 3 experiment scripts (6,560+ lines) |
| Orchestrator | 24 blocks across 6 priority levels |
| Models | Qwen2.5-0.5B, 1.5B, 3B, Coder-0.5B, SmolLM2-1.7B |
| Seeds | 42, 137, 256 (3 seeds per replication) |
| Task families | 12 families, 4,300 examples |
| NL prompts | 2,500 natural language prompts (50+ per family) |
| Hardware | RTX 2070 Super 8GB (aero) |
| Total compute | ~14 hours |
4 of 18 blocks remain. These have known bugs (API mismatches, PEFT wrapper issues) that need targeted fixes:
| Block | Description | Issue |
|---|---|---|
| C4 | Steering controls (random-vector baseline) | Needs HF-native steering API |
| G1 | Steering direction transfer across scales | Memory management for 2-model load |
| G3 | Checkpoint lock-in at 1.5B | PEFT wrapper attribute access |
| G4 | Atlas-guided layer skip + recovery | Recovery finetune DataLoader fix |
Main Phase 3 success criteria are met: replicated LoRA targeting rule, hub migration warning, causal explanation for scale differences, and deobfuscation improvement. Four follow-up blocks remain open.
All result files are in experiments/results/. Registry at experiments/registry.jsonl.
Claims, threats, and gems in claims.md, threats.md, gems.md.