granite 3.1 8b base G G U F

QuantFactory

Introduction

Granite-3.1-8B-Base is a language model developed by the Granite Team at IBM. It extends the context length capabilities of its predecessor, Granite-3.0-8B-Base, from 4K to 128K using a progressive training strategy. The model is designed for various text generation tasks and supports multiple languages.

Architecture

The Granite-3.1-8B-Base model employs a decoder-only dense transformer architecture. Key components include GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. The architecture is detailed as follows:

  • Embedding size: 4096
  • Number of layers: 40
  • Attention head size: 128
  • Number of attention heads: 32
  • MLP hidden size: 12800
  • Position embedding: RoPE
  • Number of parameters: 8.1B

Training

Granite-3.1-8B-Base was trained using a three-stage strategy involving a combination of open source and proprietary data:

  1. Stage 1: Diverse domains such as web, code, academic sources, and books.
  2. Stage 2: A curated mix of high-quality multilingual and instruction data.
  3. Stage 3: Includes synthetic long-context data in the form of QA/summary pairs.

The training utilized IBM's supercomputing cluster, Blue Vela, with NVIDIA H100 GPUs.

Guide: Running Locally

To run the Granite-3.1-8B-Base model locally, follow these steps:

  1. Install necessary libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Set up the model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-8B-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
  3. Run an example:

    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    

For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs.

License

The Granite-3.1-8B-Base model is available under the Apache 2.0 License.

More Related APIs