C L I P Vi T big G 14 laion2 B 39 B b160k

laion

Introduction

The CLIP ViT-bigG/14 model is a zero-shot image classification model developed by LAION. It is trained on the LAION-2B English subset using OpenCLIP, aiming to explore zero-shot and arbitrary image classification. The model is intended primarily for research purposes.

Architecture

The model utilizes a Vision Transformer (ViT) architecture, specifically the ViT-bigG/14 variant. It leverages the CLIP approach, combining visual and textual inputs to perform tasks such as image classification and retrieval.

Training

Training Data

The model was trained on the 2-billion sample English subset of the LAION-5B dataset, with additional fine-tuning on the LAION-A subset. This dataset is uncurated and primarily gathered from publicly available internet sources, intended for research exploration rather than commercial use.

Training Procedure

Training was conducted on the stability.ai cluster. Details on the procedure will be provided in a forthcoming blog post on laion.ai.

Guide: Running Locally

  1. Prerequisites: Ensure you have Python and PyTorch installed. Install the necessary libraries such as transformers and open_clip.

  2. Clone the Repository: Download the model files from the Hugging Face model card page.

  3. Use the Model: Implement the model in your Python environment using the Hugging Face and OpenCLIP libraries. Example snippets are typically provided in the model documentation.

  4. GPU Recommendation: For efficient performance, it is recommended to use a cloud GPU service like AWS EC2, Google Cloud, or Azure.

License

The CLIP ViT-bigG/14 model is released under the MIT License, allowing for wide-ranging use and modification, though it is primarily intended for research purposes.

More Related APIs in Zero Shot Image Classification