orca_mini_v9_7_1 B Instruct

pankajmathur

Introduction

Orca_Mini_v9_7_Llama-3.2-1B-Instruct is a model trained on the Llama-3.2-1B-Instruct framework. It is designed for text generation tasks and has been fine-tuned with various SFT datasets. The model is part of the Hugging Face ecosystem and is suitable for general-purpose text generation applications.

Architecture

The model is based on the Llama-3.2-1B architecture, a transformer model that supports text generation. It is implemented using the transformers library, allowing for flexible deployment and integration into various applications. The model leverages advanced quantization techniques, supporting both 4-bit and 8-bit precision formats, which enables efficient deployment on hardware with limited resources.

Training

Orca_Mini_v9_7_Llama-3.2-1B-Instruct was trained using a mix of human-generated and synthetic data to ensure quality and safety. The training process involved safety fine-tuning to prepare the model for various applications while minimizing risks. It includes mechanisms for handling refusals and ensuring a respectful tone in responses. The model supports further fine-tuning and customization to suit specific user needs.

Guide: Running Locally

To run the Orca Mini model locally, follow these basic steps:

  1. Set Up Environment: Install the Hugging Face transformers library and any additional dependencies such as torch and bitsandbytes for quantization support.
  2. Load the Model: Use the following Python code to load and run the model:
    import torch
    from transformers import pipeline
    
    model_slug = "pankajmathur/orca_mini_v9_7_1B-Instruct"
    pipeline = pipeline(
        "text-generation",
        model=model_slug,
        device_map="auto",
    )
    messages = [
        {"role": "system", "content": "You are Orca Mini, a helpful AI assistant."},
        {"role": "user", "content": "Hello Orca Mini, what can you do for me?"}
    ]
    outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95)
    print(outputs[0]["generated_text"][-1])
    
  3. Quantization Options: Utilize different quantization configurations (4-bit or 8-bit) for optimized performance on constrained hardware.
  4. Execution: Execute the script on a local machine or a cloud GPU. Cloud GPU services like Google Colab (with T4 GPU) are recommended for optimal performance.

License

The Orca_Mini_v9_7_Llama-3.2-1B-Instruct model is licensed under the llama3.2 license. Users are encouraged to provide proper credit and attribution when using the model. The license allows for further fine-tuning, merging, and customization, encouraging innovation and adaptation to specific needs.

More Related APIs in Text Generation