granite 3.1 8b base
ibm-graniteIntroduction
Granite-3.1-8B-Base is a language model extending the context length of its predecessor, Granite-3.0-8B-Base, from 4K to 128K through progressive training strategies. It is designed for text generation tasks such as summarization, text classification, and question-answering. The model supports multiple languages, including English, German, Spanish, and more.
Architecture
Granite-3.1-8B-Base uses a decoder-only dense transformer architecture. Key components include Generalized Query Attention (GQA), Rotary Positional Embedding (RoPE), Multi-Layer Perceptron (MLP) with SwiGLU activation, and RMSNorm. It features shared input/output embeddings and supports a sequence length of 128K.
Model Type | Dense 8B |
---|---|
Embedding Size | 4096 |
Number of Layers | 40 |
Attention Head Size | 128 |
Number of Attention Heads | 32 |
MLP Hidden Size | 12800 |
MLP Activation | SwiGLU |
Position Embedding | RoPE |
Parameters | 8.1B |
Training
Granite-3.1-8B-Base was trained on a mix of open-source and proprietary data using a three-stage strategy:
- Stage 1: Diverse domain data, including web, code, and academic sources.
- Stage 2: High-quality data with multilingual and instruction data.
- Stage 3: Includes synthetic long-context data with question-answer pairs. Training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.
Guide: Running Locally
To run Granite-3.1-8B-Base locally:
-
Install Required Libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
-
Run Example Code:
from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.1-8B-base" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() input_text = "Where is the Thomas J. Watson Research Center located?" input_tokens = tokenizer(input_text, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_length=4000) output = tokenizer.batch_decode(output) print(output)
-
Utilize Cloud GPUs: For optimal performance, consider using cloud services with GPU support, such as AWS, Google Cloud, or Azure.
License
Granite-3.1-8B-Base is licensed under the Apache 2.0 License. For more details, refer to the Apache License, Version 2.0.