Mad Mix Unleashed 12 B
ThijsL202Introduction
MadMix-Unleashed-12B is a pre-trained language model designed for text generation tasks. It is created using a combination of different models through a method known as SLERP merging, leveraging the capabilities of the Hugging Face Transformers library.
Architecture
The architecture of MadMix-Unleashed-12B is a result of merging two distinct models:
- MarinaraSpaghetti/NemoMix-Unleashed-12B
- DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS
The model adopts a mixed architecture where parameters are distributed in a "V shaped curve," utilizing NemoMix for input and output layers and MN-GRAND for the middle layers. This configuration is specified with the bfloat16
data type for efficient computations.
Training
The model was constructed using the SLERP merge method, facilitated by MergeKit. This technique allows for the blending of different model parameters, aiming to enhance performance by combining the strengths of each base model.
Guide: Running Locally
To run MadMix-Unleashed-12B locally, follow these steps:
-
Install Dependencies: Ensure you have Python installed, along with the Hugging Face Transformers library.
pip install transformers
-
Download the Model: Access the model through the Hugging Face model hub and download it to your local machine.
-
Load and Run the Model: Use the Transformers library to load the model and perform inference.
from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("ThijsL202/MadMix-Unleashed-12B") model = AutoModelForCausalLM.from_pretrained("ThijsL202/MadMix-Unleashed-12B") input_text = "Once upon a time" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs) print(tokenizer.decode(outputs[0]))
-
Cloud GPU Recommendation: For optimal performance, it is recommended to utilize cloud-based GPUs, such as those available on AWS, Google Cloud, or Azure, to handle the computational demands of the model.
License
The MadMix-Unleashed-12B model is released under an open-source license, allowing for use, modification, and distribution in accordance with the specified terms. Ensure compliance with the license terms when using the model.