granite 3.1 3b a800m instruct

ibm-granite

GRANITE-3.1-3B-A800M-INSTRUCT

Introduction

Granite-3.1-3B-A800M-Instruct is a 3 billion parameter model designed for long-context tasks. Developed by the Granite Team at IBM, it utilizes a structured chat format with techniques such as supervised finetuning, reinforcement learning, and model merging. It supports multiple languages and is optimized for various AI assistant functionalities, including summarization, text classification, and multilingual dialog.

Architecture

The model is based on a decoder-only dense transformer architecture featuring components like GQA, RoPE, MLP with SwiGLU, and RMSNorm. It includes shared input/output embeddings with parameters such as an embedding size of 2048 and 40 layers. It supports up to 128K sequence length using RoPE position embeddings.

Training

Training involves publicly available datasets, synthetic data for long-context tasks, and minimal human-curated data. It is carried out on IBM’s Blue Vela supercomputing cluster equipped with NVIDIA H100 GPUs. Ethical considerations include potential biases and inaccuracies, especially in multilingual contexts, which can be mitigated through few-shot learning.

Guide: Running Locally

  1. Install Required Libraries:
    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Example Usage:
    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-3b-a800m-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    chat = [{"role": "user", "content": "Please list one IBM Research laboratory located in the United States."}]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    input_tokens = tokenizer(chat, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_new_tokens=100)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Cloud GPUs: Consider using cloud-based GPUs like AWS or GCP for efficient model execution.

License

The model is licensed under the Apache 2.0 License, allowing for broad use and distribution.

For further information, visit the Granite Docs.

More Related APIs in Text Generation