Gemma Scope 2

A comprehensive, open suite of interpretability tools designed for the Gemma 3 model collection, allowing researchers to analyze complex language model behaviors.

Gemma Scope 2 serves as a microscope for large language models. It provides Sparse Autoencoders (SAEs) and Transcoders trained on every layer of the Gemma 3 family. By decomposing high-dimensional activations into human-inspectable features, researchers can trace internal logic without relying solely on input-output analysis.

🔬

Microscope for AI

Trace complex behaviors by inspecting internal features across all Gemma 3 layers.

🛡️

AI Safety

Investigate jailbreaks, hallucinations, and chatbot safety mechanisms at the feature level.

🔄

SAEs & Transcoders

Decomposes dense activations into sparse, interpretable sets of active concepts.

🔍 How It Works

💾

Capture Activations

Gemma Scope 2 processes vast amounts of activation data from every layer of Gemma 3 as it handles various prompts.

⚙️

Decompose Features

Using Sparse Autoencoders (SAEs), dense vectors are broken down into sparse, distinct features representing specific concepts (e.g., 'coding logic' or 'politeness').

📊

Analyze & Debug

Researchers trace which features fire during unwanted behaviors (like hallucinations) to understand the model's internal reasoning steps.

🌐 Full Family Support

Gemma Scope 2 scales across the entire Gemma 3 family. This broad coverage is critical because complex, emergent behaviors often only appear at larger model scales.

Gemma 3 270MSupported

Gemma 3 1BSupported

Gemma 3 4BSupported

Gemma 3 12BSupported

Gemma 3 27BSupported (Emergent Behaviors)

Advancing Open AI Safety

By providing open access to the weights, code, and documentation for the Gemma 3 interpretability suite, Google DeepMind empowers the AI safety community to build safer, more transparent agents for the future.