orca_mini_v9_5_1 B Instruct
pankajmathurIntroduction
Orca_Mini_v9_5_Llama-3.2-1B-Instruct is a text generation model developed on the Llama-3.2 framework, designed to serve as a versatile AI assistant. It is optimized for safety and effectiveness in various text generation tasks and supports further customization to fit specific needs.
Architecture
The model is based on Llama-3.2-1B, a derivative of the larger Llama-3.2-3B. It utilizes various SFT datasets to enhance its instruct capability, allowing it to perform a wide range of conversational tasks efficiently. It supports several quantization configurations, including 4-bit and 8-bit formats, for optimized performance on different hardware setups.
Training
The model was fine-tuned using a combination of human-generated and synthetic data. This approach ensures high-quality responses while mitigating potential safety risks. The fine-tuning process emphasizes safe interaction, refusal handling, and appropriate response tones to adversarial prompts.
Guide: Running Locally
To run the model locally:
-
Install the required libraries:
transformers
for model loading and inference.bitsandbytes
for quantization support.
-
Set up the model pipeline:
- Use the
pipeline
API fromtransformers
to load the model. - Configure the model to run in the desired precision (e.g., bfloat16, 4-bit, or 8-bit).
- Use the
-
Example Code:
import torch from transformers import pipeline model_slug = "pankajmathur/orca_mini_v9_5_1B-Instruct" pipeline = pipeline( "text-generation", model=model_slug, device_map="auto", ) messages = [ {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."}, {"role": "user", "content": "Hello Orca Mini, what can you do for me?"} ] outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95) print(outputs[0]["generated_text"][-1])
-
Cloud GPUs:
- For efficient performance, consider using cloud GPUs like Google Colab with a T4 GPU.
License
The model is released under the llama3.2
license. Users are encouraged to credit appropriately and are permitted to adapt the model for further fine-tuning or specific use cases.