orca_mini_v9_5_1 B Instruct

pankajmathur

Introduction

Orca_Mini_v9_5_Llama-3.2-1B-Instruct is a text generation model developed on the Llama-3.2 framework, designed to serve as a versatile AI assistant. It is optimized for safety and effectiveness in various text generation tasks and supports further customization to fit specific needs.

Architecture

The model is based on Llama-3.2-1B, a derivative of the larger Llama-3.2-3B. It utilizes various SFT datasets to enhance its instruct capability, allowing it to perform a wide range of conversational tasks efficiently. It supports several quantization configurations, including 4-bit and 8-bit formats, for optimized performance on different hardware setups.

Training

The model was fine-tuned using a combination of human-generated and synthetic data. This approach ensures high-quality responses while mitigating potential safety risks. The fine-tuning process emphasizes safe interaction, refusal handling, and appropriate response tones to adversarial prompts.

Guide: Running Locally

To run the model locally:

  1. Install the required libraries:

    • transformers for model loading and inference.
    • bitsandbytes for quantization support.
  2. Set up the model pipeline:

    • Use the pipeline API from transformers to load the model.
    • Configure the model to run in the desired precision (e.g., bfloat16, 4-bit, or 8-bit).
  3. Example Code:

    import torch
    from transformers import pipeline
    
    model_slug = "pankajmathur/orca_mini_v9_5_1B-Instruct"
    pipeline = pipeline(
        "text-generation",
        model=model_slug,
        device_map="auto",
    )
    messages = [
        {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."},
        {"role": "user", "content": "Hello Orca Mini, what can you do for me?"}
    ]
    outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
    print(outputs[0]["generated_text"][-1])
    
  4. Cloud GPUs:

    • For efficient performance, consider using cloud GPUs like Google Colab with a T4 GPU.

License

The model is released under the llama3.2 license. Users are encouraged to credit appropriately and are permitted to adapt the model for further fine-tuning or specific use cases.

More Related APIs in Text Generation