granite 3.1 1b a400m instruct

ibm-granite

Introduction

Granite-3.1-1B-A400M-Instruct is a long-context instruct model developed by IBM's Granite Team, featuring 8 billion parameters. It is designed for a range of tasks including text summarization, classification, extraction, and multilingual dialog use cases. The model aims to solve long-context problems and is fine-tuned using open-source instruction datasets and synthetic datasets.

Architecture

The model uses a decoder-only dense transformer architecture incorporating components like GQA, RoPE, SwiGLU, and RMSNorm. It has a sequence length of 128K and supports multilingual capabilities across twelve languages. Key configurations include:

  • Embedding size: 2048 to 4096
  • Number of layers: 24 to 40
  • Attention head size: 64 to 128
  • Number of parameters: 2.5B to 8.1B

Training

Granite-3.1-1B-A400M-Instruct is trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. The training data includes public datasets, synthetic data, and human-curated examples. The ethical design ensures safety considerations, though users should conduct safety testing for specific applications.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Load and Run the Model:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-1b-a400m-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    chat = [
        { "role": "user", "content": "Please list one IBM Research laboratory located in the United States." },
    ]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    input_tokens = tokenizer(chat, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_new_tokens=100)
    print(tokenizer.batch_decode(output))
    
  3. Suggestions for Cloud GPUs: To enhance performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

Granite-3.1-1B-A400M-Instruct is released under the Apache 2.0 License. This permissive license allows for use, distribution, and modification, provided that the license terms are met.

More Related APIs in Text Generation