gemma 2 Ifable 9 B
ifableIntroduction
GEMMA-2-IFABLE-9B is a text generation model developed by IFABLE. It is known for ranking first on the Creative Writing Benchmark as of September 10, 2024. The model is designed for creative writing applications and utilizes advanced techniques for training and evaluation.
Architecture
The model is built using the Transformers library, leveraging the powerful capabilities of this architecture for text generation tasks. It is optimized for creative writing through a combination of datasets and the SimPO training method.
Training
Training and Evaluation Data
- Gutenberg Dataset: Used for training and available at Gutenberg DPO v0.1.
- Proprietary Dataset: A carefully curated dataset specific to creative writing.
Training Procedure
- Method: SimPO (Simple Preference Optimization with a Reference-Free Reward).
- Results:
- Loss: 1.0163
- Rewards/chosen: -21.6822
- Rewards/rejected: -47.8754
- Rewards/accuracies: 0.9167
- Rewards/margins: 26.1931
- Logps/rejected: -4.7875
- Logps/chosen: -2.1682
- Logits/rejected: -17.0475
- Logits/chosen: -12.0041
Training Hyperparameters
- Learning Rate: 8e-07
- Batch Size: Train: 1, Eval: 1
- Seed: 42
- Distributed Type: Multi-GPU
- Number of Devices: 8
- Gradient Accumulation Steps: 16
- Total Train Batch Size: 128
- Total Eval Batch Size: 8
- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- Scheduler Type: Cosine
- Warmup Ratio: 0.1
- Epochs: 1.0
Framework Versions
- Transformers: 4.43.4
- PyTorch: 2.3.0a0+ebedce2
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Guide: Running Locally
- Prerequisites: Install Transformers, PyTorch, and other dependencies.
- Clone the Repository: Clone the model repository from Hugging Face.
- Load the Model: Use the Transformers library to load the model.
- Inference: Run inference locally using sample scripts provided in the repository.
Suggested Cloud GPUs
- AWS EC2: Use instances with GPU support like p3 or g4dn.
- Google Cloud Platform: Leverage GPU instances such as Tesla T4.
- Azure: Consider using NV-series VMs optimized for AI workloads.
License
The model is licensed under the GEMMA license. For detailed terms, refer to the model repository on Hugging Face.