granite 3.0 8b instruct G G U F LLM Model

Introduction

The Granite-3.0-8B-Instruct model is a quantized language model developed by IBM's Granite Team, designed for text generation tasks. It has been finetuned with open-source and synthetic datasets to handle various tasks such as summarization, text classification, and question-answering, supporting multiple languages.

Architecture

Granite-3.0-8B-Instruct is based on a decoder-only dense transformer architecture. Key features include GQA, RoPE, SwiGLU MLPs, RMSNorm, and shared input/output embeddings. It has several configurations based on the number of dense or MoE parameters, with active parameters ranging from 400 million to 8.1 billion. The model operates with a sequence length of 4096 and uses RoPE for position embeddings.

Training

The model is trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. The training data comprises publicly available datasets, internal synthetic data, and human-curated data. The model's ethical considerations include the possibility of generating biased or unsafe responses, highlighting the need for safety testing.

Guide: Running Locally

Install Required Libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Load and Run the Model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.0-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output)

Using Cloud GPUs:
Consider running the model on cloud platforms like AWS, Google Cloud, or Azure, which offer GPU instances to handle large models efficiently.

License

The Granite-3.0-8B-Instruct model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.

More Related APIs in Text Generation