granite 3.1 8b base G G U F
QuantFactoryIntroduction
Granite-3.1-8B-Base is a language model developed by the Granite Team at IBM. It extends the context length capabilities of its predecessor, Granite-3.0-8B-Base, from 4K to 128K using a progressive training strategy. The model is designed for various text generation tasks and supports multiple languages.
Architecture
The Granite-3.1-8B-Base model employs a decoder-only dense transformer architecture. Key components include GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. The architecture is detailed as follows:
- Embedding size: 4096
- Number of layers: 40
- Attention head size: 128
- Number of attention heads: 32
- MLP hidden size: 12800
- Position embedding: RoPE
- Number of parameters: 8.1B
Training
Granite-3.1-8B-Base was trained using a three-stage strategy involving a combination of open source and proprietary data:
- Stage 1: Diverse domains such as web, code, academic sources, and books.
- Stage 2: A curated mix of high-quality multilingual and instruction data.
- Stage 3: Includes synthetic long-context data in the form of QA/summary pairs.
The training utilized IBM's supercomputing cluster, Blue Vela, with NVIDIA H100 GPUs.
Guide: Running Locally
To run the Granite-3.1-8B-Base model locally, follow these steps:
-
Install necessary libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
-
Set up the model:
from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.1-8B-base" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval()
-
Run an example:
input_text = "Where is the Thomas J. Watson Research Center located?" input_tokens = tokenizer(input_text, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_length=4000) output = tokenizer.batch_decode(output) print(output)
For optimal performance, consider using cloud GPU services such as AWS EC2 with NVIDIA GPUs.
License
The Granite-3.1-8B-Base model is available under the Apache 2.0 License.