granite 3.1 8b base LLM Model

Introduction

Granite-3.1-8B-Base is a language model extending the context length of its predecessor, Granite-3.0-8B-Base, from 4K to 128K through progressive training strategies. It is designed for text generation tasks such as summarization, text classification, and question-answering. The model supports multiple languages, including English, German, Spanish, and more.

Architecture

Granite-3.1-8B-Base uses a decoder-only dense transformer architecture. Key components include Generalized Query Attention (GQA), Rotary Positional Embedding (RoPE), Multi-Layer Perceptron (MLP) with SwiGLU activation, and RMSNorm. It features shared input/output embeddings and supports a sequence length of 128K.

Model Type	Dense 8B
Embedding Size	4096
Number of Layers	40
Attention Head Size	128
Number of Attention Heads	32
MLP Hidden Size	12800
MLP Activation	SwiGLU
Position Embedding	RoPE
Parameters	8.1B

Training

Granite-3.1-8B-Base was trained on a mix of open-source and proprietary data using a three-stage strategy:

Stage 1: Diverse domain data, including web, code, and academic sources.
Stage 2: High-quality data with multilingual and instruction data.
Stage 3: Includes synthetic long-context data with question-answer pairs. Training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.

Guide: Running Locally

To run Granite-3.1-8B-Base locally:

Install Required Libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Run Example Code:

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "auto"
model_path = "ibm-granite/granite-3.1-8B-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
input_text = "Where is the Thomas J. Watson Research Center located?"
input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_length=4000)
output = tokenizer.batch_decode(output)
print(output)

Utilize Cloud GPUs: For optimal performance, consider using cloud services with GPU support, such as AWS, Google Cloud, or Azure.

License

Granite-3.1-8B-Base is licensed under the Apache 2.0 License. For more details, refer to the Apache License, Version 2.0.

More Related APIs in Text Generation