Noromaid v0.1 mixtral 8x7b v3 G P T Q

TheBloke

Introduction

The Noromaid V0.1 Mixtral 8X7B v3 GPTQ model, created by IkariDev and Undi, is a text generation model available on Hugging Face. It is designed to perform tasks such as text generation using various quantization parameters to optimize performance on different hardware configurations.

Architecture

  • Model Type: Mixtral
  • Model Creator: IkariDev and Undi
  • Quantized by: TheBloke
  • Supported Platforms: Linux (NVidia/AMD), Windows (NVidia), GGUF models for macOS
  • Model Variants: Includes versions supporting 2-8 bit quantization for CPU+GPU inference

Training

The model was trained using a combination of datasets, including Aesir, LimaRP, and others. It underwent several iterations, with training times of 8 hours for v1, 8 hours for v2, and 12 hours for v3. The training focused on role-playing (RP), uncensoring, and a modified Alpaca prompting system.

Guide: Running Locally

Basic Steps

  1. Install Prerequisites:

    pip3 install --upgrade transformers optimum auto-gptq
    
    • If using PyTorch 2.1 + CUDA 11.x, add the extra index URL:
      pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
      
  2. Download the Model:

    huggingface-cli download TheBloke/Noromaid-v0.1-mixtral-8x7b-v3-GPTQ --local-dir Noromaid-v0.1-mixtral-8x7b-v3-GPTQ
    
  3. Run Inference: Use the following Python code to perform text generation:

    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    
    model_name_or_path = "TheBloke/Noromaid-v0.1-mixtral-8x7b-v3-GPTQ"
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
    
    prompt_template = """### Instruction:
    You are a story writing assistant
    
    ### Input:
    Write a story about llamas
    
    ### Response:
    """
    
    input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
    output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
    print(tokenizer.decode(output[0]))
    

Cloud GPUs

To optimize performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which provide powerful hardware setups suitable for large-scale model inference.

License

The Noromaid V0.1 Mixtral 8X7B v3 GPTQ model is released under the Creative Commons BY-NC 4.0 license, which allows for non-commercial use with appropriate credit.

More Related APIs in Text Generation