Luminum v0.1 123 B
FluffyKaelokyIntroduction
Luminum-123B is a merged model utilizing Mistral Large as a base, combined with Lumimaid-v0.2-123B and Magnum-v2-123B. This model aims to balance the strengths of its components, maintaining Mistral's "brain power" while integrating the lexicon of Lumimaid and creative flair from Magnum, resulting in a model suitable for generating coherent and detailed text.
Architecture
Luminum-123B is based on a merged architecture using the following components:
- Base Model: Mistral Large
- Merged Models: Lumimaid-v0.2-123B and Magnum-v2-123B
- Configuration: Utilizes a
della_linear
merge method with specific weights and densities for each model component.
Training
The model was trained using a combination of the following parameters:
- Weights and Densities: Lumimaid (weight: 0.34, density: 0.8), Magnum (weight: 0.19, density: 0.5)
- Merge Method: della_linear
- Base Model: mistralaiMistral-Large-Instruct-2407
- Additional Parameters: epsilon (0.05), lambda (1), int8_mask (true), dtype (bfloat16)
Guide: Running Locally
To run Luminum-123B locally, follow these steps:
-
Install Prerequisites:
- Ensure Python and necessary libraries like
transformers
are installed.
- Ensure Python and necessary libraries like
-
Download Model:
- Access the model files from the repository, ensuring all dependencies are downloaded.
-
Set Up Environment:
- Configure your environment to use a high-performance GPU. Suggested cloud GPU providers include AWS, Google Cloud, and Azure.
-
Run the Model:
- Use the Hugging Face Transformers library to load and run the model locally with your specified input.
-
Model Settings:
- Recommended settings: Minp (0.08), Rep penalty (1.03), Rep penalty range (4096), Smoothing factor (0.23), No Repeat NGram Size (2).
License
The Luminum-123B model is subject to the licensing terms and conditions specified by the original model proprietors. Users should review these terms to ensure compliance with any restrictions on usage or distribution.