granite 3.1 2b instruct

ibm-granite

Introduction

Granite-3.1-2B-Instruct is a 2 billion parameter model developed by IBM's Granite Team. It is finetuned from the Granite-3.1-2B-Base model, utilizing a mix of open source instruction datasets and internally generated synthetic datasets to handle long-context tasks. This model supports multiple languages and is designed for various applications, including AI assistants.

Architecture

The model is based on a decoder-only dense transformer architecture, featuring components such as GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It consists of 40 layers with an embedding size of 2048 and uses 32 attention heads. The architecture also supports long sequence lengths up to 128K tokens.

Training

The training data for Granite-3.1-2B-Instruct includes publicly available datasets, internal synthetic data, and a small amount of human-curated data. Training is conducted on IBM's Blue Vela supercomputing cluster, equipped with NVIDIA H100 GPUs, utilizing a total of 12 trillion training tokens.

Guide: Running Locally

To run the Granite-3.1-2B-Instruct model locally, follow these steps:

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run the Model:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-2b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    chat = [
        { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
    ]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    input_tokens = tokenizer(chat, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_new_tokens=100)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS, Google Cloud, or Azure to efficiently run the model.

License

Granite-3.1-2B-Instruct is licensed under the Apache 2.0 License. For more details, visit the Apache License 2.0.

More Related APIs in Text Generation