Phi 3 mini 4k instruct onnx LLM Model

Introduction

The PHI-3 Mini-4K-Instruct ONNX models are optimized versions designed to accelerate inference using ONNX Runtime. Developed by Microsoft, these models focus on text generation with high-quality, reasoning-dense data, optimized for precision across various platforms.

Architecture

PHI-3 Mini is part of the Phi-3 model family, derived from synthetic data and filtered websites, with a focus on fine-tuning and preference optimization. The models are available in ONNX format to support both CPU and GPU execution, including on Windows, Linux, Mac, and mobile devices. Features include DirectML support for hardware acceleration on Windows across AMD, Intel, and NVIDIA GPUs.

Training

The model employs a supervised fine-tuning process combined with direct preference optimization to ensure precise instruction adherence and robust safety measures. Various optimizations such as int4 and fp16 quantization are implemented to balance performance and accuracy for different hardware configurations.

Guide: Running Locally

Install ONNX Runtime: Ensure ONNX Runtime is installed on your system.
Download the Model: Obtain the PHI-3 Mini-4K-Instruct ONNX model from the repository.
Select Execution Environment: Choose an appropriate environment (CPU, GPU, or mobile) based on your hardware.

Run the Model: Use the ONNX Runtime Generate() API to execute the model. Example command:

python model-qa.py -m /*{YourModelPath}*/onnx/cpu_and_mobile/phi-3-mini-4k-instruct-int4-cpu -k 40 -p 0.95 -t 0.8 -r 1.0

Cloud GPUs: Consider using cloud services with NVIDIA A100 or similar GPUs for enhanced performance.

License

The PHI-3 Mini-4K-Instruct ONNX models are released under the MIT License, allowing for broad usage and modification with attribution.

More Related APIs in Text Generation