Wizard Vicuna 30 B Uncensored G P T Q

TheBloke

Introduction

The Wizard-Vicuna-30B-Uncensored-GPTQ is a language model based on Eric Hartford's Wizard Vicuna 30B model. It is designed to provide helpful, detailed, and polite responses in a conversational format. The model is available in various quantized forms to optimize performance across different hardware configurations.

Architecture

The model is built on the Llama architecture, offering various quantization options, including 2, 3, 4, 5, 6, and 8-bit precision. These options cater to different VRAM requirements and inference quality needs. Each quantization setting affects the model's performance and memory usage, enabling customization based on user requirements.

Training

The model was created by adjusting the original Wizard-Vicuna-30B-Uncensored model using GPTQ quantization techniques. It uses a dataset called wikitext for quantization, distinct from the dataset used for the original model training. The quantization process is designed to maintain high accuracy while reducing computational and memory demands.

Guide: Running Locally

Basic Steps

  1. Install Prerequisites: Ensure you have Python installed along with transformers, optimum, and auto-gptq packages.

    pip3 install transformers>=4.32.0 optimum>=1.12.0
    pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
    
  2. Model Download: Use the text-generation-webui for easy setup. Choose the desired branch for specific quantization.

    git clone --single-branch --branch main https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
    
  3. Load the Model: Use Python to load the model and tokenizer.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model = AutoModelForCausalLM.from_pretrained("TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained("TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ", use_fast=True)
    
  4. Generate Responses: Use the model to generate text.

    prompt = "Tell me about AI"
    input_ids = tokenizer(prompt, return_tensors='pt').input_ids
    output = model.generate(inputs=input_ids)
    print(tokenizer.decode(output[0]))
    

Cloud GPUs

For optimal performance, it is recommended to use cloud services that provide GPUs, such as AWS, Google Cloud, or Azure, especially for larger models or higher quantization settings.

License

This model is licensed under a custom license. Please refer to the original repository for specific terms and conditions. The model's use is subject to responsibilities akin to handling a potentially hazardous tool; users are accountable for its deployment and any content generated.

More Related APIs in Text Generation