L3.3 70 B Euryale v2.3
Sao10KL3.3-70B-EURYALE-V2.3 Model Documentation
Introduction
L3.3-70B-EURYALE-V2.3 is a model designed as a direct successor to Euryale v2.2, offering enhancements over its predecessor. It is built on top of Llama 3.3 Instruct and aims to deliver improved performance in text generation tasks.
Architecture
The model is constructed using the Axolotl framework, version 0.5.2, and is based on the meta-llama/Llama-3.3-70B-Instruct
. It employs the AutoModelForCausalLM
for the model type and AutoTokenizer
for tokenization. Key architectural features include:
- Sequence length: 16384
- Flash attention: Enabled
- Adapter: LoRA (Low-Rank Adaptation)
lora_r
: 128lora_alpha
: 16lora_dropout
: 0.1lora_target_linear
: true
Training
The model was trained using a variety of datasets, each tailored for specific instructive and creative purposes. Training details include:
- Warmup steps: 15
- Number of epochs: 1
- Gradient accumulation steps: 4
- Micro batch size: 1
- Optimizer:
paged_ademamix_8bit
- Learning rate scheduler: Cosine
- Learning rate: 0.000004
- Weight decay: 0.1
- Max gradient norm: 25.0
Guide: Running Locally
To run the model locally, follow these steps:
- Ensure you have a compatible environment with PyTorch and Transformers installed.
- Clone the repository containing the model files.
- Load the model using the
transformers
library. - Configure the model settings according to your requirements (e.g., temperature, prompt format).
For optimal performance, using a cloud GPU is recommended. Providers like AWS, Google Cloud, or Azure offer robust GPU options.
License
The model is available under the llama3
license, which dictates the terms of use and distribution. Users should ensure compliance with the license when utilizing the model for their purposes.