Noromaid v0.1 mixtral 8x7b v3 G P T Q
TheBlokeIntroduction
The Noromaid V0.1 Mixtral 8X7B v3 GPTQ model, created by IkariDev and Undi, is a text generation model available on Hugging Face. It is designed to perform tasks such as text generation using various quantization parameters to optimize performance on different hardware configurations.
Architecture
- Model Type: Mixtral
- Model Creator: IkariDev and Undi
- Quantized by: TheBloke
- Supported Platforms: Linux (NVidia/AMD), Windows (NVidia), GGUF models for macOS
- Model Variants: Includes versions supporting 2-8 bit quantization for CPU+GPU inference
Training
The model was trained using a combination of datasets, including Aesir, LimaRP, and others. It underwent several iterations, with training times of 8 hours for v1, 8 hours for v2, and 12 hours for v3. The training focused on role-playing (RP), uncensoring, and a modified Alpaca prompting system.
Guide: Running Locally
Basic Steps
-
Install Prerequisites:
pip3 install --upgrade transformers optimum auto-gptq
- If using PyTorch 2.1 + CUDA 11.x, add the extra index URL:
pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
- If using PyTorch 2.1 + CUDA 11.x, add the extra index URL:
-
Download the Model:
huggingface-cli download TheBloke/Noromaid-v0.1-mixtral-8x7b-v3-GPTQ --local-dir Noromaid-v0.1-mixtral-8x7b-v3-GPTQ
-
Run Inference: Use the following Python code to perform text generation:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_name_or_path = "TheBloke/Noromaid-v0.1-mixtral-8x7b-v3-GPTQ" model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) prompt_template = """### Instruction: You are a story writing assistant ### Input: Write a story about llamas ### Response: """ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0]))
Cloud GPUs
To optimize performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure, which provide powerful hardware setups suitable for large-scale model inference.
License
The Noromaid V0.1 Mixtral 8X7B v3 GPTQ model is released under the Creative Commons BY-NC 4.0 license, which allows for non-commercial use with appropriate credit.