orca_alpaca_3b

pankajmathur

Introduction

The ORCA_ALPACA_3B model is an Open_LLaMA-3B model trained using explain-tuned datasets. This model utilizes instructions and inputs from Alpaca datasets and applies dataset construction methods from the Orca Research Paper.

Architecture

ORCA_ALPACA_3B is built on the Open_LLaMA architecture and designed for text generation tasks. It integrates system instructions from the Orca Research Paper to enhance its training data, enabling the model to mimic the thought processes of teacher models like ChatGPT.

Training

The model was trained using DeepSpeed with Zero-3 approaches on four A600 (50G) GPUs for approximately 20 hours, costing around $66. The training utilized a custom script, leveraging some code from the OpenAlpaca repository. Key training parameters include:

  • Batch size: 16
  • Train micro-batch size per GPU: 2
  • Gradient accumulation steps: 2
  • Learning rate: 2e-5
  • Max length: 1024
  • Epochs: 3

Guide: Running Locally

To run the ORCA_ALPACA_3B model locally, follow these steps:

  1. Install Dependencies: Ensure torch and transformers libraries are installed.
  2. Load Model and Tokenizer:
    import torch
    from transformers import LlamaForCausalLM, LlamaTokenizer
    
    model_path = 'psmathur/alpaca_orca_open_llama_3b'
    tokenizer = LlamaTokenizer.from_pretrained(model_path)
    model = LlamaForCausalLM.from_pretrained(
        model_path, torch_dtype=torch.float16, device_map='auto',
    )
    
  3. Generate Text: Use the provided function to generate text based on system prompts and instructions.

For better performance, consider using cloud GPU services like AWS EC2, Google Cloud, or Lambda Labs.

License

The model is licensed under CC BY-NC-SA 4.0, which allows sharing and adapting the work non-commercially, as long as attribution is provided and derivatives are licensed under the same terms.

More Related APIs in Text Generation