granite 3.1 3b a800m base

ibm-granite

Introduction

Granite-3.1-3B-A800M-Base is an advanced language model developed by IBM's Granite Team. It extends the context length capabilities of its predecessor, utilizing a progressive training strategy to support a context length of up to 128K. The model was pre-trained on approximately 500 billion tokens and is designed for various text generation tasks, such as summarization, text classification, and question-answering.

Architecture

The model is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. Key components include Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. It features:

  • Embedding size: 1536
  • Number of layers: 32
  • Attention head size: 64
  • Number of attention heads: 24
  • MLP hidden size: 512
  • Number of experts: 40
  • Parameters: 3.3B total, 800M active
  • Position embedding: RoPE
  • Sequence length: 128K

Training

Granite-3.1-3B-A800M-Base was trained using a three-stage strategy:

  1. Stage 1: Data from diverse domains such as web, code, and academic sources.
  2. Stage 2: Curated high-quality data, including multilingual and instruction data.
  3. Stage 3: Synthetic long-context data in the form of QA/summary pairs.

Training was conducted on IBM's Blue Vela supercomputing cluster, equipped with NVIDIA H100 GPUs.

Guide: Running Locally

Steps to Run

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run the Example Code:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-3b-a800m-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    

Suggested Cloud GPUs

For optimal performance, consider using cloud GPUs such as NVIDIA A100 or V100 available on platforms like AWS, Google Cloud, or Azure.

License

Granite-3.1-3B-A800M-Base is licensed under the Apache 2.0 License, allowing for both personal and commercial use.

More Related APIs in Text Generation