Dark Idol Llama 3.1 8 B Instruct 1.2 Uncensored G G U F

QuantFactory

Introduction

DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF is a quantized text generation model designed for multilingual dialogue scenarios. It supports roleplay and conversational tasks, optimized for mobile use, and offers a quick response for text and code generation. The model is uncensored and includes a variety of role-playing capabilities.

Architecture

DarkIdol-Llama-3.1-8B is an auto-regressive language model using a transformer architecture. It utilizes supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety. The model is designed for multilingual use, supporting 11 languages.

Training

The model was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned with over 25 million synthetic examples. It uses Grouped-Query Attention (GQA) for enhanced inference scalability. The training involved significant computational resources and aimed to minimize greenhouse gas emissions.

Guide: Running Locally

  1. Install Prerequisites:

    • Ensure transformers library version >= 4.43.0 is installed.
    • Install additional libraries using pip install datasets openai.
  2. Setup Model:

    • Import necessary libraries: import transformers and import torch.
    • Initialize the model using Hugging Face's Transformers pipeline:
      model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
      pipeline = transformers.pipeline(
          "text-generation",
          model=model_id,
          model_kwargs={"torch_dtype": torch.bfloat16},
          device_map="auto",
      )
      
  3. Run Inference:

    • Prepare input messages and generate responses:
      messages = [
          {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
          {"role": "user", "content": "Who are you?"},
      ]
      
      outputs = pipeline(
          messages,
          max_new_tokens=256,
      )
      print(outputs[0]["generated_text"][-1])
      
  4. Consider Cloud GPUs:

    • Utilize cloud services like AWS, Google Cloud, or Azure for GPU resources to enhance performance and manage large-scale inference efficiently.

License

The model operates under the Llama 3.1 Community License. For commercial licensing or further details, please refer to the license documentation available at Meta's license repository.

More Related APIs in Text Generation