E X A O N E 3.5 7.8 B Instruct

LGAI-EXAONE

Introduction

EXAONE 3.5 is a series of instruction-tuned bilingual (English and Korean) generative models developed by LG AI Research. These models range from 2.4B to 32B parameters, offering different levels of performance and deployment capabilities. They support long-context processing of up to 32K tokens and excel in real-world use cases and long-context understanding.

Architecture

The repository focuses on the instruction-tuned 7.8B model with:

  • Parameters (excluding embeddings): 6.98B
  • Layers: 32
  • Attention Heads: GQA with 32 Q-heads and 8 KV-heads
  • Vocab Size: 102,400
  • Context Length: 32,768 tokens

Training

EXAONE 3.5 models were trained using system prompts, allowing them to understand and generate responses based on specified instructions. The models are evaluated across various real-world use cases, demonstrating competitive performance against similar-sized models.

Guide: Running Locally

  1. Install Requirements:

    • Ensure transformers v4.43 or later is installed.
    • Use Python with libraries torch and transformers.
  2. Code Snippet for Inference:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct"
    
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    prompt = "Explain how wonderful you are"  # English example
    messages = [
        {"role": "system", 
         "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    )
    
    output = model.generate(
        input_ids.to("cuda"),
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=128,
        do_sample=False,
    )
    print(tokenizer.decode(output[0]))
    
  3. Suggest Cloud GPUs:

    • Utilize cloud platforms like AWS, GCP, or Azure for GPU resources to handle large models efficiently.

License

The EXAONE model is licensed under the EXAONE AI Model License Agreement 1.1 - NC.

More Related APIs in Text Generation