Teuken 7 B instruct commercial v0.4 G G U F

QuantFactory

Introduction

The Teuken-7B-instruct-commercial-v0.4-GGUF is a quantized version of the multilingual large language model (LLM) Teuken-7B, specifically designed for instruction tuning. It supports 24 European languages and is developed under the OpenGPT-X project with contributions from various institutions.

Architecture

The model is a transformer-based, decoder-only architecture with the following specifications:

  • Parameters: 7B
  • Sequence Length: 4096
  • Layers: 32
  • Hidden Size: 4096
  • Feedforward Network Size: 13440
  • Attention Heads: 32
  • Position Embeddings: Rotary
  • Normalization: RMSNorm

Training

The base model, Teuken-7B-base-v0.4, was pre-trained on 4 trillion tokens from publicly available sources up to September 2023. It was further instruction-tuned using English and German datasets, along with translations in 22 other European languages. Training utilized bf16 mixed precision on the JUWELS Booster infrastructure with NVIDIA A100 GPUs.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Required Libraries:

    • transformers
    • sentencepiece
    • torch
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model_name = "openGPT-X/Teuken-7B-instruct-commercial-v0.4"
    model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).to(device).eval()
    tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False, trust_remote_code=True)
    
  3. Generate Text:

    messages = [{"role": "User", "content": "Wer bist du?"}]
    prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt")
    prediction = model.generate(prompt_ids.to(model.device), max_length=512, do_sample=True, top_k=50, top_p=0.95, temperature=0.7, num_return_sequences=1)
    print(tokenizer.decode(prediction[0].tolist()))
    

For optimal performance, using cloud GPUs like NVIDIA A100 is recommended.

License

The Teuken-7B-instruct-commercial-v0.4-GGUF model is released under the Apache 2.0 license, which allows for both commercial and non-commercial use.

More Related APIs in Text Generation