Phi 3.5 vision instruct onnx

microsoft

Introduction

The Phi-3.5 Vision Instruct ONNX models are optimized versions of the Phi-3.5 vision-instruct model, designed to accelerate inference using ONNX Runtime on both CPU and GPU. These models support a high context length and are fine-tuned for precise instruction adherence and robust safety measures.

Architecture

The model is a lightweight, state-of-the-art open multimodal model that processes both text and vision data. It utilizes datasets with synthetic and publicly available data focused on high-quality, reasoning-dense content. The optimized versions are available in ONNX format, suitable for various devices and platforms including Windows, Linux, Mac desktops, and mobile CPUs.

Training

The base model has undergone enhancement through supervised fine-tuning and direct preference optimization. These processes ensure the model's adherence to instructions and safety. The ONNX models use int4 quantization to optimize performance on CPU and GPU.

Guide: Running Locally

  1. Install ONNX Runtime: Ensure ONNX Runtime is installed on your machine.
  2. Download the Model: Access the optimized Phi-3.5 Vision Instruct ONNX models from the repository.
  3. Set Up Environment: Ensure your system meets the minimum configuration requirements:
    • CPU with 16GB RAM.
    • CUDA-capable NVIDIA GPU with Compute Capability >= 7.0 for GPU inference.
    • For Windows, a DirectX 12-capable GPU and at least 10GB of combined RAM.
  4. Run Inference: Use the provided API to integrate the model into your application.

For enhanced performance, consider using cloud GPUs like NVIDIA A100 or RTX 4080.

License

The models are provided under the MIT License, which allows for extensive freedom to use, modify, and distribute the software while attributing the original creators. Users are responsible for verifying the model's output within their specific use cases.

More Related APIs