E X A O N E 3.5 2.4 B Instruct

LGAI-EXAONE

EXAONE-3.5-2.4B-INSTRUCT

Introduction

EXAONE 3.5 is a series of bilingual generative models developed by LG AI Research, featuring parameters ranging from 2.4 billion to 32 billion. These models are designed for instruction-tuned tasks in both English and Korean, with capabilities for long-context processing up to 32K tokens. The 2.4B model is optimized for small or resource-constrained environments, while larger models provide enhanced performance. EXAONE models excel in real-world applications and maintain competitive performance in general domains compared to similar models.

Architecture

The 2.4B model includes:

  • Parameters (excluding embeddings): 2.14 billion
  • Layers: 30
  • Attention Heads: 32 Q-heads and 8 KV-heads
  • Vocabulary Size: 102,400
  • Context Length: 32,768 tokens
  • Tie Word Embeddings: Enabled (distinct from 7.8B and 32B models)

Training

EXAONE 3.5 models are instruction-tuned and designed to handle a variety of real-world scenarios, as detailed in the technical report available on arXiv. These models are evaluated across different benchmarks, with results indicating superior performance in multiple areas.

Guide: Running Locally

To run EXAONE 3.5-2.4B locally, follow these steps:

  1. Install the required version of the Transformers library (v4.43 or later).

  2. Use the provided Python script to load and run the model:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.bfloat16,
        trust_remote_code=True,
        device_map="auto"
    )
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    # Example prompts
    prompt = "Explain how wonderful you are"  # English
    prompt = "스스로를 자랑해 봐"       # Korean
    
    messages = [
        {"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    )
    
    output = model.generate(
        input_ids.to("cuda"),
        eos_token_id=tokenizer.eos_token_id,
        max_new_tokens=128,
        do_sample=False,
    )
    print(tokenizer.decode(output[0]))
    
  3. To optimize performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.

License

The use of EXAONE 3.5 models is governed by the EXAONE AI Model License Agreement 1.1 - NC. For more details, refer to the license document.

More Related APIs in Text Generation