granite 3.1 8b base G G U F LLM Model

Introduction

Granite-3.1-8B-Base is a language model developed by the Granite Team at IBM. It extends the context length capabilities of its predecessor, Granite-3.0-8B-Base, from 4K to 128K using a progressive training strategy. The model is designed for various text generation tasks and supports multiple languages.

Architecture

The Granite-3.1-8B-Base model employs a decoder-only dense transformer architecture. Key components include GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. The architecture is detailed as follows:

Embedding size: 4096
Number of layers: 40
Attention head size: 128
Number of attention heads: 32
MLP hidden size: 12800
Position embedding: RoPE
Number of parameters: 8.1B

Training

Granite-3.1-8B-Base was trained using a three-stage strategy involving a combination of open source and proprietary data:

Stage 1: Diverse domains such as web, code, academic sources, and books.
Stage 2: A curated mix of high-quality multilingual and instruction data.
Stage 3: Includes synthetic long-context data in the form of QA/summary pairs.

The training utilized IBM's supercomputing cluster, Blue Vela, with NVIDIA H100 GPUs.

Guide: Running Locally

To run the Granite-3.1-8B-Base model locally, follow these steps:

Install necessary libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Set up the model:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.1-8B-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

Run an example:

input_text = "Where is the Thomas J. Watson Research Center located?"
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_length=4000)
output = tokenizer.batch_decode(output)
print(output)

For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs.

License

The Granite-3.1-8B-Base model is available under the Apache 2.0 License.

More Related APIs