orca_mini_v9_7_1 B Instruct
pankajmathurIntroduction
Orca_Mini_v9_7_Llama-3.2-1B-Instruct is a model trained on the Llama-3.2-1B-Instruct framework. It is designed for text generation tasks and has been fine-tuned with various SFT datasets. The model is part of the Hugging Face ecosystem and is suitable for general-purpose text generation applications.
Architecture
The model is based on the Llama-3.2-1B architecture, a transformer model that supports text generation. It is implemented using the transformers
library, allowing for flexible deployment and integration into various applications. The model leverages advanced quantization techniques, supporting both 4-bit and 8-bit precision formats, which enables efficient deployment on hardware with limited resources.
Training
Orca_Mini_v9_7_Llama-3.2-1B-Instruct was trained using a mix of human-generated and synthetic data to ensure quality and safety. The training process involved safety fine-tuning to prepare the model for various applications while minimizing risks. It includes mechanisms for handling refusals and ensuring a respectful tone in responses. The model supports further fine-tuning and customization to suit specific user needs.
Guide: Running Locally
To run the Orca Mini model locally, follow these basic steps:
- Set Up Environment: Install the Hugging Face
transformers
library and any additional dependencies such astorch
andbitsandbytes
for quantization support. - Load the Model: Use the following Python code to load and run the model:
import torch from transformers import pipeline model_slug = "pankajmathur/orca_mini_v9_7_1B-Instruct" pipeline = pipeline( "text-generation", model=model_slug, device_map="auto", ) messages = [ {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."}, {"role": "user", "content": "Hello Orca Mini, what can you do for me?"} ] outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95) print(outputs[0]["generated_text"][-1])
- Quantization Options: Utilize different quantization configurations (4-bit or 8-bit) for optimized performance on constrained hardware.
- Execution: Execute the script on a local machine or a cloud GPU. Cloud GPU services like Google Colab (with T4 GPU) are recommended for optimal performance.
License
The Orca_Mini_v9_7_Llama-3.2-1B-Instruct model is licensed under the llama3.2 license. Users are encouraged to provide proper credit and attribution when using the model. The license allows for further fine-tuning, merging, and customization, encouraging innovation and adaptation to specific needs.