EXAONE-3.5-2.4B-INSTRUCT

Introduction

EXAONE 3.5 is a series of bilingual generative models developed by LG AI Research, featuring parameters ranging from 2.4 billion to 32 billion. These models are designed for instruction-tuned tasks in both English and Korean, with capabilities for long-context processing up to 32K tokens. The 2.4B model is optimized for small or resource-constrained environments, while larger models provide enhanced performance. EXAONE models excel in real-world applications and maintain competitive performance in general domains compared to similar models.

Architecture

The 2.4B model includes:

Parameters (excluding embeddings): 2.14 billion
Layers: 30
Attention Heads: 32 Q-heads and 8 KV-heads
Vocabulary Size: 102,400
Context Length: 32,768 tokens
Tie Word Embeddings: Enabled (distinct from 7.8B and 32B models)

Training

EXAONE 3.5 models are instruction-tuned and designed to handle a variety of real-world scenarios, as detailed in the technical report available on arXiv. These models are evaluated across different benchmarks, with results indicating superior performance in multiple areas.

Guide: Running Locally

To run EXAONE 3.5-2.4B locally, follow these steps:

Install the required version of the Transformers library (v4.43 or later).

Use the provided Python script to load and run the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example prompts
prompt = "Explain how wonderful you are"  # English
prompt = "스스로를 자랑해 봐"       # Korean

messages = [
    {"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
    {"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output = model.generate(
    input_ids.to("cuda"),
    eos_token_id=tokenizer.eos_token_id,
    max_new_tokens=128,
    do_sample=False,
)
print(tokenizer.decode(output[0]))

To optimize performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.

License

The use of EXAONE 3.5 models is governed by the EXAONE AI Model License Agreement 1.1 - NC. For more details, refer to the license document.