TINY CLIP

Introduction

This is a smaller version of the CLIP model, specifically trained for English. It is approximately 8 times smaller than the original CLIP model. This reduction was achieved by utilizing a smaller text model (microsoft/xtremedistil-l6-h256-uncased) and a smaller vision model (edgenext_small). For a detailed guide on training CLIP, refer to this blog.

Architecture

The model architecture involves a smaller text component and a vision component, designed to efficiently handle zero-shot image classification tasks while maintaining performance within the constraints of reduced size.

Training

The training script for TINY CLIP can be found on Kaggle. The training process involves distilling the larger CLIP model into a more compact form, using optimized components for text and vision processing.

Guide: Running Locally

To use TINY CLIP locally, follow these steps:

  1. Install Git Large File Storage (LFS):

    git lfs install
    
  2. Clone the repository:

    git clone https://huggingface.co/sachin/tiny_clip
    cd tiny_clip
    
  3. Import and set up the model:

    import models
    text_encoder, tokenizer, vision_encoder, transform = models.get_model()
    

For enhanced performance, especially during training and inference, consider using cloud GPUs from providers like AWS, Google Cloud Platform, or Azure.

License

This project is licensed under the MIT License, allowing for flexibility in modification and distribution.

More Related APIs in Zero Shot Image Classification