granite 3.0 8b instruct G G U F

QuantFactory

Introduction

The Granite-3.0-8B-Instruct model is a quantized language model developed by IBM's Granite Team, designed for text generation tasks. It has been finetuned with open-source and synthetic datasets to handle various tasks such as summarization, text classification, and question-answering, supporting multiple languages.

Architecture

Granite-3.0-8B-Instruct is based on a decoder-only dense transformer architecture. Key features include GQA, RoPE, SwiGLU MLPs, RMSNorm, and shared input/output embeddings. It has several configurations based on the number of dense or MoE parameters, with active parameters ranging from 400 million to 8.1 billion. The model operates with a sequence length of 4096 and uses RoPE for position embeddings.

Training

The model is trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. The training data comprises publicly available datasets, internal synthetic data, and human-curated data. The model's ethical considerations include the possibility of generating biased or unsafe responses, highlighting the need for safety testing.

Guide: Running Locally

  1. Install Required Libraries:

    pip install torch torchvision torchaudio
    pip install accelerate
    pip install transformers
    
  2. Load and Run the Model:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = "auto"
    model_path = "ibm-granite/granite-3.0-8b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
    model.eval()
    
    chat = [
        { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
    ]
    chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
    input_tokens = tokenizer(chat, return_tensors="pt").to(device)
    output = model.generate(**input_tokens, max_new_tokens=100)
    output = tokenizer.batch_decode(output)
    print(output)
    
  3. Using Cloud GPUs:
    Consider running the model on cloud platforms like AWS, Google Cloud, or Azure, which offer GPU instances to handle large models efficiently.

License

The Granite-3.0-8B-Instruct model is released under the Apache 2.0 License, allowing for wide use and modification with proper attribution.

More Related APIs in Text Generation