swin base patch4 window7 224 in22k
microsoftIntroduction
The Swin Transformer model is a large-sized Vision Transformer pre-trained on the ImageNet-21k dataset, which consists of 14 million images across 21,841 classes. This model was introduced in the paper "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" by Liu et al. It is designed to serve as a versatile backbone for image classification and dense recognition tasks.
Architecture
The Swin Transformer builds hierarchical feature maps by merging image patches in deeper layers. It has linear computational complexity relative to input image size due to its self-attention mechanism being computed within local windows. This contrasts with earlier vision Transformers that have quadratic complexity and produce single low-resolution feature maps due to global self-attention computation.
Training
The model is pre-trained on the extensive ImageNet-21k dataset, allowing it to recognize a wide variety of image classes. It forms part of a series of models aimed at enhancing image classification performance through hierarchical vision structures.
Guide: Running Locally
To use this model for image classification, follow these steps:
-
Install Dependencies: Ensure you have the
transformers
library from Hugging Face andPIL
installed. Usepip install transformers pillow
if needed. -
Load the Model and Processor:
from transformers import AutoImageProcessor, SwinForImageClassification from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = AutoImageProcessor.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k") model = SwinForImageClassification.from_pretrained("microsoft/swin-base-patch4-window7-224-in22k")
-
Prepare the Input:
inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_idx])
-
Cloud GPUs: For enhanced performance, especially with large models, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The Swin Transformer model is released under the Apache 2.0 license, which allows for both commercial and non-commercial use with proper attribution.