Phi 3.5 mini instruct G G U F

bartowski

Introduction

PHI-3.5-MINI-INSTRUCT-GGUF is a multilingual text generation model based on Microsoft's Phi-3.5-mini-instruct. It is designed to handle various NLP tasks, including code generation and conversational interfaces. The model uses GGUF library and supports inference endpoints.

Architecture

The model utilizes the LLAMA.cpp framework for quantization, employing the release b3751 for optimal performance. The original model is hosted by Microsoft and has been adapted by Bartowski using the imatrix option for quantization.

Training

Quantizations were performed using a specific dataset, with various quantized weights available to optimize performance based on different hardware configurations. The model supports different quantization levels, from very high quality (Q8_0) to those optimized for ARM inference (Q4_0_X_X).

Guide: Running Locally

  1. Install huggingface-cli:

    pip install -U "huggingface_hub[cli]"
    
  2. Download a specific file:

    huggingface-cli download bartowski/Phi-3.5-mini-instruct-GGUF --include "Phi-3.5-mini-instruct-Q4_K_M.gguf" --local-dir ./
    
  3. For models larger than 50GB:

    huggingface-cli download bartowski/Phi-3.5-mini-instruct-GGUF --include "Phi-3.5-mini-instruct-Q8_0/*" --local-dir ./
    
  4. Select the appropriate quant:

    • For ARM chips, use the Q4_0_X_X quants for enhanced speed.
    • Determine the maximum model size based on your RAM and VRAM, choosing a quant that fits within your constraints.
  5. Consider cloud GPUs: To maximize performance, consider using cloud services like AWS or Google Cloud that offer powerful GPUs, which can significantly enhance model inference speed.

License

The model is licensed under the MIT License. You can review the full license here.

More Related APIs in Text Generation