E X A O N E 3.5 2.4 B Instruct
LGAI-EXAONEEXAONE-3.5-2.4B-INSTRUCT
Introduction
EXAONE 3.5 is a series of bilingual generative models developed by LG AI Research, featuring parameters ranging from 2.4 billion to 32 billion. These models are designed for instruction-tuned tasks in both English and Korean, with capabilities for long-context processing up to 32K tokens. The 2.4B model is optimized for small or resource-constrained environments, while larger models provide enhanced performance. EXAONE models excel in real-world applications and maintain competitive performance in general domains compared to similar models.
Architecture
The 2.4B model includes:
- Parameters (excluding embeddings): 2.14 billion
- Layers: 30
- Attention Heads: 32 Q-heads and 8 KV-heads
- Vocabulary Size: 102,400
- Context Length: 32,768 tokens
- Tie Word Embeddings: Enabled (distinct from 7.8B and 32B models)
Training
EXAONE 3.5 models are instruction-tuned and designed to handle a variety of real-world scenarios, as detailed in the technical report available on arXiv. These models are evaluated across different benchmarks, with results indicating superior performance in multiple areas.
Guide: Running Locally
To run EXAONE 3.5-2.4B locally, follow these steps:
-
Install the required version of the Transformers library (v4.43 or later).
-
Use the provided Python script to load and run the model:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Example prompts prompt = "Explain how wonderful you are" # English prompt = "스스로를 자랑해 봐" # Korean messages = [ {"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."}, {"role": "user", "content": prompt} ] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ) output = model.generate( input_ids.to("cuda"), eos_token_id=tokenizer.eos_token_id, max_new_tokens=128, do_sample=False, ) print(tokenizer.decode(output[0]))
-
To optimize performance, consider using cloud-based GPU services like AWS, Google Cloud, or Azure.
License
The use of EXAONE 3.5 models is governed by the EXAONE AI Model License Agreement 1.1 - NC. For more details, refer to the license document.