granite 3.1 3b a800m instruct
ibm-graniteGRANITE-3.1-3B-A800M-INSTRUCT
Introduction
Granite-3.1-3B-A800M-Instruct is a 3 billion parameter model designed for long-context tasks. Developed by the Granite Team at IBM, it utilizes a structured chat format with techniques such as supervised finetuning, reinforcement learning, and model merging. It supports multiple languages and is optimized for various AI assistant functionalities, including summarization, text classification, and multilingual dialog.
Architecture
The model is based on a decoder-only dense transformer architecture featuring components like GQA, RoPE, MLP with SwiGLU, and RMSNorm. It includes shared input/output embeddings with parameters such as an embedding size of 2048 and 40 layers. It supports up to 128K sequence length using RoPE position embeddings.
Training
Training involves publicly available datasets, synthetic data for long-context tasks, and minimal human-curated data. It is carried out on IBM’s Blue Vela supercomputing cluster equipped with NVIDIA H100 GPUs. Ethical considerations include potential biases and inaccuracies, especially in multilingual contexts, which can be mitigated through few-shot learning.
Guide: Running Locally
- Install Required Libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
- Example Usage:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.1-3b-a800m-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() chat = [{"role": "user", "content": "Please list one IBM Research laboratory located in the United States."}] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) input_tokens = tokenizer(chat, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_new_tokens=100) output = tokenizer.batch_decode(output) print(output)
- Cloud GPUs: Consider using cloud-based GPUs like AWS or GCP for efficient model execution.
License
The model is licensed under the Apache 2.0 License, allowing for broad use and distribution.
For further information, visit the Granite Docs.