granite 3.1 1b a400m instruct LLM Model

Introduction

Granite-3.1-1B-A400M-Instruct is a long-context instruct model developed by IBM's Granite Team, featuring 8 billion parameters. It is designed for a range of tasks including text summarization, classification, extraction, and multilingual dialog use cases. The model aims to solve long-context problems and is fine-tuned using open-source instruction datasets and synthetic datasets.

Architecture

The model uses a decoder-only dense transformer architecture incorporating components like GQA, RoPE, SwiGLU, and RMSNorm. It has a sequence length of 128K and supports multilingual capabilities across twelve languages. Key configurations include:

Embedding size: 2048 to 4096
Number of layers: 24 to 40
Attention head size: 64 to 128
Number of parameters: 2.5B to 8.1B

Training

Granite-3.1-1B-A400M-Instruct is trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. The training data includes public datasets, synthetic data, and human-curated examples. The ethical design ensures safety considerations, though users should conduct safety testing for specific applications.

Guide: Running Locally

To run the model locally, follow these steps:

Install Required Libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Load and Run the Model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.1-1b-a400m-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()

chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens, max_new_tokens=100)
print(tokenizer.batch_decode(output))

Suggestions for Cloud GPUs: To enhance performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

Granite-3.1-1B-A400M-Instruct is released under the Apache 2.0 License. This permissive license allows for use, distribution, and modification, provided that the license terms are met.

More Related APIs in Text Generation