Phi 3 vision 128k instruct onnx

microsoft

Introduction

The Phi-3 Vision-128K-Instruct ONNX model is a lightweight, multimodal model developed by Microsoft. It is optimized to accelerate inference using ONNX Runtime, supporting both CPU and GPU. This model is built on datasets with high-quality, reasoning-dense data in text and vision. It is part of Microsoft's Phi-3 model family, supporting a context length of up to 128K tokens. The model undergoes extensive fine-tuning for precise instruction adherence and safety.

Architecture

The Phi-3 Vision model leverages ONNX for optimized inference across various platforms, including servers, desktops, and mobile devices. It includes specific optimizations:

  • INT4 CPU: Uses int4 quantization via RTN for CPUs.
  • INT4 GPU: Uses int4 quantization via RTN for GPUs.

Training

The model has been enhanced through supervised fine-tuning and direct preference optimization. This process ensures precise instruction adherence and robust safety measures. Performance metrics indicate the ONNX vision model's output is similar to the Phi-3-mini-128k-instruct-onnx model during token generation.

Guide: Running Locally

To run the Phi-3 Vision-128K-Instruct ONNX model locally, follow these steps:

  1. Install ONNX Runtime: Ensure that ONNX Runtime is installed and configured on your machine.
  2. Hardware Requirements:
    • CPU: Intel Core i9-10920X or equivalent with 16GB RAM.
    • GPU: A100 GPU, RTX 4080, or any GPU with Compute Capability >= 7.0 (CUDA).
    • Operating System: Windows with DirectX 12-capable GPU and at least 10GB RAM.
  3. API Access: Use the provided API to integrate the model into your application.
  4. Run Inference: Execute inference using the ONNX Runtime API.
  5. Verify Output: Test and verify the model output for your specific applications.

Cloud GPUs: For enhanced performance, consider using cloud services like Azure with A100 GPUs.

License

The Phi-3 Vision-128K-Instruct ONNX model is licensed under the MIT License. Users are responsible for verifying and testing model outputs for their specific scenarios. The model's performance may slightly differ from the base model due to optimizations.

More Related APIs