E X A O N E 3.5 32 B Instruct LLM Model

Introduction

EXAONE 3.5 is a series of instruction-tuned, bilingual (English and Korean) generative language models developed by LG AI Research. These models range from 2.4 billion to 32 billion parameters, optimized for various deployment scenarios, including small devices and high-performance applications. They support long-context processing of up to 32K tokens and demonstrate state-of-the-art performance in real-world settings and general domains.

Architecture

The EXAONE 3.5-32B model features:

Parameters (without embeddings): 30.95 billion
Layers: 64
Attention Heads: GQA with 40 Q-heads and 8 KV-heads
Vocabulary Size: 102,400
Context Length: 32,768 tokens

Training

EXAONE 3.5 models are trained to utilize system prompts effectively. They have been evaluated in various real-world scenarios, showing competitive results against similarly sized models.

Guide: Running Locally

Basic Steps

Install Transformers library: Ensure you have version 4.43 or later.

Load the Model and Tokenizer:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-32B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Run Inference: Use the provided code snippet to perform conversational inference.
GPU Recommendation: For optimal performance, use a cloud GPU service like AWS EC2 with NVIDIA GPUs or Google Cloud's AI Platform.

Cloud GPUs

Using cloud GPUs is recommended for handling the model's computational requirements effectively.

License

The EXAONE 3.5 model is licensed under the EXAONE AI Model License Agreement 1.1 - NC. For full details, refer to the license document.

More Related APIs in Text Generation