granite 3.1 2b base G G U F

QuantFactory

Introduction

The GRANITE-3.1-2B-BASE-GGUF is a quantized version of the IBM Granite-3.1 model, offering enhanced context length capabilities and designed for various text-to-text generation tasks. It builds upon the previous Granite-3.0 model by leveraging an extended context length through progressive training strategies.

Architecture

Granite-3.1-2B-Base is based on a decoder-only dense transformer architecture. It incorporates components like GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. Key parameters include:

  • Embedding size: 2048
  • Number of layers: 40
  • Attention head size: 64
  • Number of attention heads: 32
  • MLP hidden size: 8192
  • Position embedding: RoPE
  • Parameters: 2.5B

Training

The model follows a three-stage training strategy on a mix of open-source and proprietary data.

  • Stage 1: Diverse data from domains such as web, code, and academic sources.
  • Stage 2: Curated high-quality data with multilingual and instruction data.
  • Stage 3: Combines previous data with synthetic long-context data for enhancement.

Training was conducted using IBM's Blue Vela supercomputing cluster equipped with NVIDIA H100 GPUs.

Guide: Running Locally

To run the Granite-3.1-2B-Base model locally, follow these steps:

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run the Example Code:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    device = "auto"
    model_path = "ibm-granite/granite-3.1-2b-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Cloud GPUs: Consider using cloud GPUs from providers like AWS, Google Cloud, or Azure for more intensive workloads.

License

The GRANITE-3.1-2B-BASE-GGUF is licensed under Apache 2.0, which allows for use, distribution, and modification with proper attribution.

More Related APIs