granite 3.1 2b instruct
ibm-graniteIntroduction
Granite-3.1-2B-Instruct is a 2 billion parameter model developed by IBM's Granite Team. It is finetuned from the Granite-3.1-2B-Base model, utilizing a mix of open source instruction datasets and internally generated synthetic datasets to handle long-context tasks. This model supports multiple languages and is designed for various applications, including AI assistants.
Architecture
The model is based on a decoder-only dense transformer architecture, featuring components such as GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It consists of 40 layers with an embedding size of 2048 and uses 32 attention heads. The architecture also supports long sequence lengths up to 128K tokens.
Training
The training data for Granite-3.1-2B-Instruct includes publicly available datasets, internal synthetic data, and a small amount of human-curated data. Training is conducted on IBM's Blue Vela supercomputing cluster, equipped with NVIDIA H100 GPUs, utilizing a total of 12 trillion training tokens.
Guide: Running Locally
To run the Granite-3.1-2B-Instruct model locally, follow these steps:
-
Install Required Libraries:
pip install torch torchvision torchaudio pip install accelerate pip install transformers
-
Run the Model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer device = "auto" model_path = "ibm-granite/granite-3.1-2b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device) model.eval() chat = [ { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." }, ] chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) input_tokens = tokenizer(chat, return_tensors="pt").to(device) output = model.generate(**input_tokens, max_new_tokens=100) output = tokenizer.batch_decode(output) print(output)
-
Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure to efficiently run the model.
License
Granite-3.1-2B-Instruct is licensed under the Apache 2.0 License. For more details, visit the Apache License 2.0.