tiny_clip
sachinTINY CLIP
Introduction
This is a smaller version of the CLIP model, specifically trained for English. It is approximately 8 times smaller than the original CLIP model. This reduction was achieved by utilizing a smaller text model (microsoft/xtremedistil-l6-h256-uncased
) and a smaller vision model (edgenext_small
). For a detailed guide on training CLIP, refer to this blog.
Architecture
The model architecture involves a smaller text component and a vision component, designed to efficiently handle zero-shot image classification tasks while maintaining performance within the constraints of reduced size.
Training
The training script for TINY CLIP can be found on Kaggle. The training process involves distilling the larger CLIP model into a more compact form, using optimized components for text and vision processing.
Guide: Running Locally
To use TINY CLIP locally, follow these steps:
-
Install Git Large File Storage (LFS):
git lfs install
-
Clone the repository:
git clone https://huggingface.co/sachin/tiny_clip cd tiny_clip
-
Import and set up the model:
import models text_encoder, tokenizer, vision_encoder, transform = models.get_model()
For enhanced performance, especially during training and inference, consider using cloud GPUs from providers like AWS, Google Cloud Platform, or Azure.
License
This project is licensed under the MIT License, allowing for flexibility in modification and distribution.