granite 3.0 2b base G G U F

QuantFactory

Introduction

The Granite-3.0-2B-Base is a decoder-only language model designed for various text-to-text generation tasks. Developed by the Granite Team at IBM, it supports multiple languages and can be fine-tuned for additional languages. The model is trained using a two-stage strategy on a large dataset.

Architecture

Granite-3.0-2B-Base utilizes a decoder-only dense transformer architecture with components such as GQA and RoPE, MLP with SwiGLU, and RMSNorm. It features shared input/output embeddings. The model comprises 40 layers with 32 attention heads, each having a size of 64, and utilizes RoPE for position embedding. It has approximately 2.5B parameters.

Training

The training involves a two-stage process:

  • Stage 1: Training on 10 trillion tokens from diverse domains like web, code, academic sources, books, and math data.
  • Stage 2: Further training on 2 trillion tokens using high-quality data to enhance task-specific performance.

The model is trained on IBM's Blue Vela supercomputing cluster, which uses NVIDIA H100 GPUs and runs on 100% renewable energy.

Guide: Running Locally

To run the Granite-3.0-2B-Base model locally, follow these steps:

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.0-2b-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Suggested Cloud GPUs: Consider using cloud platforms like AWS, Google Cloud, or Azure that offer GPU instances to handle model inference more efficiently.

License

The Granite-3.0-2B-Base model is licensed under the Apache 2.0 License.

More Related APIs in Text Generation