SAELens

1. Gemma Scope 2

Gemma Scope 2 is a comprehensive, open suite of sparse autoencoders and transcoders for a range of model sizes and versions in the Gemma 3 model family. We have SAEs on three different sites (as well as transcoders) for every layer of the pretrained and instruction-tuned models of parameter sizes 270M, 1B, 4B, 12B and 27B. We also include several multi-layer SAE variants: partial residual stream crosscoders for every base Gemma 3 model, and cross-layer transcoders for the 270M and 1B models.

Sparse Autoencoders are a "microscope" of sorts that can help us break down a model's internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

You can read more in our blog post, and also see our landing page for details on the whole suite.

2. What Is In This Repo?

This repo contains a specific set of SAEs and transcoders: the ones trained on Gemma V3 270M PT. Every folder here contains a different suite of models. Each of the folders in this page are named for the type of model that was trained:

  • Single-layer models
    • resid_post, attn_out and mlp_out contain SAEs at 4 different layers (25%, 50%, 65% and 85% depth) and a variety of widths & L0 values, trained on the model's residual stream, attention output, and MLP output respectively.
    • transcoder contain a range of transcoders (or skip-transcoders) trained on the same 4 layers, with a variety of widths & L0 values.
    • resid_post_all, attn_out_all, mlp_out_all and transcoder_all contain a smaller range of widths & L0 values, but for every single layer in the model.
  • Multi-layer models
    • crosscoder contains a set of weakly causal crosscoders which were trained on 4 concatenated layers of the residual stream (the same as those we trained our subsets on)
    • clt contains a set of cross-layer transcoders, which were trained to reconstruct the whole model's MLP outputs from the residual stream values just before each MLP layer.

So for example, google/gemma-scope-2-270m-pt/resid_post contains a range of SAEs trained on the residual stream of gemma-v3-270m-pt at 4 different layers.

3. How can I use these SAEs straight away?

from sae_lens import SAE  # pip install sae-lens

sae, cfg_dict, sparsity = SAE.from_pretrained(
    release = "gemma-scope-2-270m-pt-resid_post",
    sae_id = "layer_12_width_16k_l0_small",
)

4. Which SAE should I use?

Unless you're doing full circuit-style analysis, we recommend using SAEs / transcoders from the layer subset folders, e.g. resid_post or transcoder. Assuming you're using residual stream SAEs from resid_post, then:

  • Width: our SAEs have widths 16k, 64k, 256k, 1m. You can visit Neuronpedia to get a qualitative sense of what kinds of features you can find at different widths, but we generally recommend using 64k or 256k.
  • L0: our SAEs have target L0 values "small" (10-20), "medium" (30-60) or "large" (60-150)". You can also look at the config.json file saved with every SAE's parameters to check exactly what the L0 is (or just visit the Neuronpedia page!). We generally recommend using "medium" which is useful for most tasks, although this might vary depending on your exact use case. Again you can visit Neuronpedia to get a sense of what kind of features each model type finds.

5. Point of Contact

Point of contact: Callum McDougall Contact by email: [email protected]

6. Citation

Paper link here

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including google/gemma-scope-2-270m-pt