Which layers matter?
Residual, MLP, head, module, and position-specific ablations measure how much output distributions move when a component is removed.
I tested where behavior moves inside small models, then used that map to target LoRA, steering, ablation, patching, and negative controls. The result is not a universal theory of language models. It is a practical rule: map the model you plan to edit before you edit it.
The atlas links behavior to components, training changes, activations, and interventions. The core measurements are causal: remove, patch, steer, train, compare.
Residual, MLP, head, module, and position-specific ablations measure how much output distributions move when a component is removed.
Activation patching tests whether a component carries behavior by transferring clean or trained activations into a different run.
Adapters are trained on controlled task families, then compared against the base model by layer, module, norm, and downstream behavior.
Activation additions test whether a direction can increase, suppress, or damage a behavior. Steering claims remain control-sensitive.
The data-format ablation holds content fixed across 6 formats. Real judge scores from mimo-v2.5 show the base model outperforms all adapters. Loss is inversely correlated with quality: formats that are harder to learn from (structured terse, single-turn chat) produced better adapters.
Full SFT OOMed on 8GB VRAM, naive layer skipping broke outputs, and some early patching and scoring setups were not useful.
These are the claims that survive the current audit. Confidence is intentionally conservative.
| Claim | Evidence | Confidence | Details |
|---|---|---|---|
| Qwen2.5-0.5B has a stable L2 causal hub under the main suite. | L2 is top under ablation across 12 families. Phase 3 replicated hub=L2 across seeds 42, 137, and 2026 with std=0.0. Position-specific ablation shows first-token and last-token effects, not a uniform layer. | High | Open 0.5B atlas |
| Qwen hub location changes with scale and task suite breadth. | Full-suite hubs: Qwen2.5-0.5B L2, Qwen2.5-1.5B L14, Qwen2.5-3B L34. Each replicated across 3 seeds with std=0.0. The 1.5B hub was revised from L26 to L14 after widening the suite. | High | Open Phase 3 |
| Atlas-guided LoRA is parameter-efficient on JSON schema following. | On Qwen2.5-0.5B JSON, atlas-guided LoRA used 319K trainable params and reached exact_match=1.000. All-linear also reached 1.000 but used 4.4M params. That is 13.8x more parameters. | High for JSON | Open LoRA evidence |
| Generic conclusions do not transfer cleanly across tasks. | Factual and code tasks did not match the clean JSON story. For code semantics, all-linear won exact match; atlas-guided had lower loss than random-matched but did not reach exact-match parity. | Medium | Open task split |
| Adapter effects are late-layer effects, not upstream propagation. | Adapter-only ablation on the JSON adapter gives norm-effect correlation around 0.85, with effects concentrated around L19-L23. The older "norm-effect separation" story is refuted. | Medium | Open adapter notes |
| Naive layer skipping is not a free speedup. | Layer skipping produced 0% top-5 token overlap across tested skip configs in the audited claim. Recovery fine-tuning is still open work. | High negative result | Open skip result |
| 4-bit NF4 is the practical inference choice in the tested 1.5B setup. | Qwen2.5-1.5B bf16 ran at 18.8 tok/s, 4-bit NF4 at 17.1 tok/s, and 8-bit at 9.0 tok/s. Quality stayed close in the qualitative checks. Causal-surface drift is still under test. | Medium-high | Open quant notes |
| Training loss is inversely correlated with behavioral quality at 230M scale. | Real mimo-v2.5 judge on 153 eval prompts: base model (3.17) beats all 8 adapters. Best adapter: single-turn chat (2.60). Worst: quality bsmagpie (1.93). Loss-quality correlation r ≈ −0.7. 300-example SFT causes catastrophic overfitting. | High | Open format ablation |
The useful part of the project is not that every early guess held up. Several did not. That is the point of the atlas.
Initial Qwen2.5-0.5B atlas. Found L2, LoRA rewiring, late-layer trained activations, and a suspiciously strong factual knockout result.
Scale and controls. Some claims held; some shifted. SmolLM2 did not support a simple universal hub rule.
Multi-seed replication and practical tests. The strongest result is atlas-guided LoRA for JSON with 13.8x fewer parameters.
SFT sweep (39 runs) found dataset format matters 5x more than hyperparameters. Phase 9 format ablation with real judge showed base model beats all adapters at 230M scale. Loss inversely correlates with quality: lower loss = more overfitting.
This is the applied training branch of the atlas: 39 supervised fine-tuning runs on LFM2.5-230M across datasets, optimizers, LoRA rank, target modules, learning rate, steps, and data volume.
In the Phase 8 sweep, smol-magpie-ultra reached loss 1.25 while Alpaca sat around 6.07. The dataset gap was much larger than optimizer or learning-rate gaps.
Open SFT sweepOn smol-magpie-ultra, Adafactor gave loss 1.267 and KL 0.109 while using less memory than AdamW. Lion drifted more in this setup.
Open optimizer tableHigher rank helped less with each step up. Rank 8 used 213K trainable params in the sweep; rank 32 used 852K for a smaller additional loss gain.
Open rank sweepThe hub plus o_proj setup used about 65K params and KL 0.039. All-linear used far more params and moved the model more, while reaching lower loss.
Open target sweepReal mimo-v2.5 judge scores: base model overall 3.17 vs best adapter 2.60 (single-turn chat). Loss is inversely correlated with quality (r ≈ −0.7). Formats harder to learn from (structured terse, single-turn chat) produced better adapters. 300-example SFT caused catastrophic overfitting at 230M scale.
Open format ablationMulti-turn verbose: best loss (1.37), worst judge score (1.99). Structured terse: worst loss (1.83), 2nd-best judge score (2.52). H4 strongly confirmed. This is the most important finding from Phase 9R.
Open loss vs quality analysisThese are useful, but I would not sell them as final laws.
In the LFM2.5-230M pass, L0 had the largest measured effect and early-layer steering moved outputs more. This used a corrected operator/MLP hook because full-layer zeroing caused cascade-zero artifacts.
Open LFM2 atlasThe 6-format ablation holds content fixed. Multi-turn verbose gave the lowest loss (1.37) but the worst judge score (1.99). Structured terse had the worst loss (1.83) but 2nd-best judge score (2.52). Loss and quality are inversely correlated — lower loss means more overfitting at 230M scale.
Open format detailsFor memory-constrained LFM2 SFT, the best current default is good multi-turn data, Adafactor, rank 8, and hub-targeted modules when low drift matters.
Open recipe evidenceSteering looked stronger when tested at the right layers in larger Qwen models. Random-vector and shuffled-label controls are still required before treating this as a clean task-specific intervention.
Open steering notesLong-task checks suggest some hub locations and steering effects change with prompt length. This limits how far short synthetic prompts can be stretched.
Open robustness notesThis is the part that keeps the page honest.
The hub locations are model-specific. Even within Qwen, scale and training change the answer. Across architectures, the story changes again.
The page does not rely on attention maps, probes, or SAE labels alone. Claims need an intervention, a metric, and a control path.
Full SFT did not fit on 8GB VRAM. Full-residual patching was trivial. Early clean/corrupt pairs had tokenization problems. JSON knockout had limited room to suppress targets.
This is a reproducible research project, not a finished paper. The strongest results are ready to build on; several controls are still open.
What I would actually do if I had to edit a small model tomorrow.
Run ablation and patching on the exact model, task family, prompt style, and quantization you plan to use.
The 1.5B hub moved from L26 to L14 when the suite widened. Narrow suites can give confident wrong answers.
Atlas-guided LoRA was clearly parameter-efficient for JSON. Do not assume the same recipe wins every task.
Layer, strength, prompt length, and quantization all matter. Add random-vector and shuffled-label controls.
The naive skip experiments failed hard. If you want speed, test recovery fine-tuning rather than deleting layers and hoping.
The negative results were not cleanup. They changed the method and killed bad claims.