orca_alpaca_3b
pankajmathurIntroduction
The ORCA_ALPACA_3B model is an Open_LLaMA-3B model trained using explain-tuned datasets. This model utilizes instructions and inputs from Alpaca datasets and applies dataset construction methods from the Orca Research Paper.
Architecture
ORCA_ALPACA_3B is built on the Open_LLaMA architecture and designed for text generation tasks. It integrates system instructions from the Orca Research Paper to enhance its training data, enabling the model to mimic the thought processes of teacher models like ChatGPT.
Training
The model was trained using DeepSpeed with Zero-3 approaches on four A600 (50G) GPUs for approximately 20 hours, costing around $66. The training utilized a custom script, leveraging some code from the OpenAlpaca repository. Key training parameters include:
- Batch size: 16
- Train micro-batch size per GPU: 2
- Gradient accumulation steps: 2
- Learning rate: 2e-5
- Max length: 1024
- Epochs: 3
Guide: Running Locally
To run the ORCA_ALPACA_3B model locally, follow these steps:
- Install Dependencies: Ensure
torch
andtransformers
libraries are installed. - Load Model and Tokenizer:
import torch from transformers import LlamaForCausalLM, LlamaTokenizer model_path = 'psmathur/alpaca_orca_open_llama_3b' tokenizer = LlamaTokenizer.from_pretrained(model_path) model = LlamaForCausalLM.from_pretrained( model_path, torch_dtype=torch.float16, device_map='auto', )
- Generate Text: Use the provided function to generate text based on system prompts and instructions.
For better performance, consider using cloud GPU services like AWS EC2, Google Cloud, or Lambda Labs.
License
The model is licensed under CC BY-NC-SA 4.0, which allows sharing and adapting the work non-commercially, as long as attribution is provided and derivatives are licensed under the same terms.