granite 3.0 2b base G G U F
QuantFactoryIntroduction
The Granite-3.0-2B-Base is a decoder-only language model designed for various text-to-text generation tasks. Developed by the Granite Team at IBM, it supports multiple languages and can be fine-tuned for additional languages. The model is trained using a two-stage strategy on a large dataset.
Architecture
Granite-3.0-2B-Base utilizes a decoder-only dense transformer architecture with components such as GQA and RoPE, MLP with SwiGLU, and RMSNorm. It features shared input/output embeddings. The model comprises 40 layers with 32 attention heads, each having a size of 64, and utilizes RoPE for position embedding. It has approximately 2.5B parameters.
Training
The training involves a two-stage process:
- Stage 1: Training on 10 trillion tokens from diverse domains like web, code, academic sources, books, and math data.
- Stage 2: Further training on 2 trillion tokens using high-quality data to enhance task-specific performance.
The model is trained on IBM's Blue Vela supercomputing cluster, which uses NVIDIA H100 GPUs and runs on 100% renewable energy.
Guide: Running Locally
To run the Granite-3.0-2B-Base model locally, follow these steps:
-
Install Required Libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
-
Run the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.0-2b-base" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() input_text = "Where is the Thomas J. Watson Research Center located?" input_tokens = tokenizer(input_text, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_length=4000) output = tokenizer.batch_decode(output) print(output)
-
Suggested Cloud GPUs: Consider using cloud platforms like AWS, Google Cloud, or Azure that offer GPU instances to handle model inference more efficiently.
License
The Granite-3.0-2B-Base model is licensed under the Apache 2.0 License.