granite 3.1 8b instruct LLM Model

Introduction
Granite-3.1-8B-Instruct is an advanced instruction model developed by the Granite Team at IBM. It is designed to handle long-context problems using a combination of open-source and synthetic datasets. This model, with 8 billion parameters, supports multiple languages and is optimized for various AI tasks, including AI assistants for business applications.

Architecture
The model utilizes a decoder-only dense transformer architecture, featuring components such as GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It consists of 40 layers, with an attention head size of 128 and a sequence length of 128K. The architecture supports both dense and Mixture of Experts (MoE) configurations with varying numbers of parameters and active parameters.

Training
Granite-3.1-8B-Instruct is trained on a mix of publicly available datasets, synthetic data targeting specific capabilities, and a small amount of human-curated data. The training infrastructure includes IBM's Blue Vela supercomputing cluster, equipped with NVIDIA H100 GPUs, enabling efficient scaling over thousands of GPUs. Ethical considerations emphasize multilingual capabilities and safety alignment, although performance may vary across languages.

Guide: Running Locally

Install Required Libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Load the Model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.1-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

Generate Text:

chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output)

Cloud GPUs Suggestion: Consider using cloud services like AWS, Google Cloud, or Azure for access to high-performance GPUs, which can efficiently handle the model's computational requirements.

License
The Granite-3.1-8B-Instruct model is distributed under the Apache 2.0 License, allowing for open-source use and modification.

More Related APIs in Text Generation