granite 3.1 1b a400m instruct
ibm-graniteIntroduction
Granite-3.1-1B-A400M-Instruct is a long-context instruct model developed by IBM's Granite Team, featuring 8 billion parameters. It is designed for a range of tasks including text summarization, classification, extraction, and multilingual dialog use cases. The model aims to solve long-context problems and is fine-tuned using open-source instruction datasets and synthetic datasets.
Architecture
The model uses a decoder-only dense transformer architecture incorporating components like GQA, RoPE, SwiGLU, and RMSNorm. It has a sequence length of 128K and supports multilingual capabilities across twelve languages. Key configurations include:
- Embedding size: 2048 to 4096
- Number of layers: 24 to 40
- Attention head size: 64 to 128
- Number of parameters: 2.5B to 8.1B
Training
Granite-3.1-1B-A400M-Instruct is trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. The training data includes public datasets, synthetic data, and human-curated examples. The ethical design ensures safety considerations, though users should conduct safety testing for specific applications.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Required Libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
-
Load and Run the Model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.1-1b-a400m-instruct" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() chat = [ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States." }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) input_tokens = tokenizer(chat, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_new_tokens=100) print(tokenizer.batch_decode(output))
-
Suggestions for Cloud GPUs: To enhance performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
Granite-3.1-1B-A400M-Instruct is released under the Apache 2.0 License. This permissive license allows for use, distribution, and modification, provided that the license terms are met.