Mad Mix Unleashed 12 B

ThijsL202

Introduction

MadMix-Unleashed-12B is a pre-trained language model designed for text generation tasks. It is created using a combination of different models through a method known as SLERP merging, leveraging the capabilities of the Hugging Face Transformers library.

Architecture

The architecture of MadMix-Unleashed-12B is a result of merging two distinct models:

  • MarinaraSpaghetti/NemoMix-Unleashed-12B
  • DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS

The model adopts a mixed architecture where parameters are distributed in a "V shaped curve," utilizing NemoMix for input and output layers and MN-GRAND for the middle layers. This configuration is specified with the bfloat16 data type for efficient computations.

Training

The model was constructed using the SLERP merge method, facilitated by MergeKit. This technique allows for the blending of different model parameters, aiming to enhance performance by combining the strengths of each base model.

Guide: Running Locally

To run MadMix-Unleashed-12B locally, follow these steps:

  1. Install Dependencies: Ensure you have Python installed, along with the Hugging Face Transformers library.

    pip install transformers
    
  2. Download the Model: Access the model through the Hugging Face model hub and download it to your local machine.

  3. Load and Run the Model: Use the Transformers library to load the model and perform inference.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    tokenizer = AutoTokenizer.from_pretrained("ThijsL202/MadMix-Unleashed-12B")
    model = AutoModelForCausalLM.from_pretrained("ThijsL202/MadMix-Unleashed-12B")
    
    input_text = "Once upon a time"
    inputs = tokenizer(input_text, return_tensors="pt")
    outputs = model.generate(**inputs)
    print(tokenizer.decode(outputs[0]))
    
  4. Cloud GPU Recommendation: For optimal performance, it is recommended to utilize cloud-based GPUs, such as those available on AWS, Google Cloud, or Azure, to handle the computational demands of the model.

License

The MadMix-Unleashed-12B model is released under an open-source license, allowing for use, modification, and distribution in accordance with the specified terms. Ensure compliance with the license terms when using the model.

More Related APIs in Text Generation