granite 3.1 1b a400m base

ibm-granite

Introduction

The Granite-3.1-1B-A400M-Base model is an advanced language model developed by IBM's Granite Team. It extends the context length capabilities of its predecessor, Granite-3.0, to 128K tokens using a progressive training strategy. The model is designed for various text generation tasks across multiple languages and domains.

Architecture

Granite-3.1-1B-A400M-Base is built on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. It includes components like Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss. The model consists of 1.3 billion parameters with 400 million active parameters during inference. It supports a sequence length of up to 128K tokens and uses RoPE for position embeddings.

Training

The model is trained using a two-stage strategy involving a mix of open-source and proprietary data. Stage 1 focuses on a diverse set of domains, while Stage 2 incorporates high-quality multilingual and instructional data to enhance performance. Stage 3 introduces synthetic long-context data. Training was conducted on IBM's Blue Vela super computing cluster with NVIDIA H100 GPUs.

Guide: Running Locally

  1. Install Required Libraries:
    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Run Example Code:
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.1-1b-a400m-base"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    input_text = "Where is the Thomas J. Watson Research Center located?"
    input_tokens = tokenizer(input_text, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_length=4000)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Recommendations: For optimal performance, consider using cloud-based GPUs offered by platforms like AWS, Google Cloud, or Azure.

License

The Granite-3.1-1B-A400M-Base model is released under the Apache 2.0 License, which allows for both personal and commercial use with proper attribution.

More Related APIs in Text Generation