gemma scope

google

Introduction

Gemma Scope is a versatile suite of sparse autoencoders designed for analyzing the internal activations of large language models, specifically Gemma 2 9B and 2B. These autoencoders function like microscopes, deconstructing a model's internal workings to reveal underlying concepts. The repository does not contain model weights; however, links to various repositories with different configurations of the Gemma Scope models are provided.

Architecture

Gemma Scope utilizes sparse autoencoders (SAEs) to dissect the internal representations of models. These SAEs have been trained on various layers and configurations, including attention, MLP, and residual components, across different model sizes, such as 2.6B, 9B, and 27B. The SAEs are implemented in frameworks like PyTorch and JAX, allowing for flexible adaptation and use in different research or production environments.

Training

The training of SAEs is demonstrated in tutorials using PyTorch and JAX. These tutorials cover loading the SAEs and implementing the JumpReLU activation function, a variation designed to enhance the performance of sparse autoencoders. Detailed instructions are provided in Google Colab notebooks, facilitating experimentation and exploration of the Gemma Scope's capabilities.

Guide: Running Locally

To run Gemma Scope locally, follow these steps:

  1. Clone the Repository:
    Clone the relevant GitHub repository containing the configuration you wish to use.

  2. Install Dependencies:
    Ensure you have Python and necessary packages such as PyTorch or JAX installed. Use a virtual environment for managing dependencies.

  3. Load the Model:
    Follow the instructions in the Google Colab notebook to load and test the SAEs.

  4. Train or Evaluate:
    Use the provided scripts to train or evaluate the models on your data.

  5. Cloud GPUs:
    For intensive computations, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure to expedite processing.

License

The Gemma Scope project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0), allowing for sharing and adaptation with appropriate credit.

More Related APIs