granite 3.1 8b base

ibm-granite

Introduction

Granite-3.1-8B-Base is a language model extending the context length of its predecessor, Granite-3.0-8B-Base, from 4K to 128K through progressive training strategies. It is designed for text generation tasks such as summarization, text classification, and question-answering. The model supports multiple languages, including English, German, Spanish, and more.

Architecture

Granite-3.1-8B-Base uses a decoder-only dense transformer architecture. Key components include Generalized Query Attention (GQA), Rotary Positional Embedding (RoPE), Multi-Layer Perceptron (MLP) with SwiGLU activation, and RMSNorm. It features shared input/output embeddings and supports a sequence length of 128K.

Model Type Dense 8B
Embedding Size 4096
Number of Layers 40
Attention Head Size 128
Number of Attention Heads 32
MLP Hidden Size 12800
MLP Activation SwiGLU
Position Embedding RoPE
Parameters 8.1B

Training

Granite-3.1-8B-Base was trained on a mix of open-source and proprietary data using a three-stage strategy:

  • Stage 1: Diverse domain data, including web, code, and academic sources.
  • Stage 2: High-quality data with multilingual and instruction data.
  • Stage 3: Includes synthetic long-context data with question-answer pairs. Training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.

Guide: Running Locally

To run Granite-3.1-8B-Base locally:

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run Example Code:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    device = "auto"
    model_path = "ibm-granite/granite-3.1-8B-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Utilize Cloud GPUs: For optimal performance, consider using cloud services with GPU support, such as AWS, Google Cloud, or Azure.

License

Granite-3.1-8B-Base is licensed under the Apache 2.0 License. For more details, refer to the Apache License, Version 2.0.

More Related APIs in Text Generation