Dark Idol Llama 3.1 8 B Instruct 1.2 Uncensored G G U F
QuantFactoryIntroduction
DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF is a quantized text generation model designed for multilingual dialogue scenarios. It supports roleplay and conversational tasks, optimized for mobile use, and offers a quick response for text and code generation. The model is uncensored and includes a variety of role-playing capabilities.
Architecture
DarkIdol-Llama-3.1-8B is an auto-regressive language model using a transformer architecture. It utilizes supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety. The model is designed for multilingual use, supporting 11 languages.
Training
The model was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned with over 25 million synthetic examples. It uses Grouped-Query Attention (GQA) for enhanced inference scalability. The training involved significant computational resources and aimed to minimize greenhouse gas emissions.
Guide: Running Locally
-
Install Prerequisites:
- Ensure
transformers
library version >= 4.43.0 is installed. - Install additional libraries using
pip install datasets openai
.
- Ensure
-
Setup Model:
- Import necessary libraries:
import transformers
andimport torch
. - Initialize the model using Hugging Face's Transformers pipeline:
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto", )
- Import necessary libraries:
-
Run Inference:
- Prepare input messages and generate responses:
messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipeline( messages, max_new_tokens=256, ) print(outputs[0]["generated_text"][-1])
- Prepare input messages and generate responses:
-
Consider Cloud GPUs:
- Utilize cloud services like AWS, Google Cloud, or Azure for GPU resources to enhance performance and manage large-scale inference efficiently.
License
The model operates under the Llama 3.1 Community License. For commercial licensing or further details, please refer to the license documentation available at Meta's license repository.