swin tiny patch4 window7 224
microsoftSwin Transformer (Tiny-Sized Model)
Introduction
The Swin Transformer is a Vision Transformer model trained on ImageNet-1k dataset at a resolution of 224x224. It was introduced in the paper "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" by Liu et al. This model is designed for image classification tasks.
Architecture
The Swin Transformer builds hierarchical feature maps by merging image patches in deeper layers. It achieves linear computation complexity with respect to input image size by computing self-attention only within local windows. This design allows it to serve as a general-purpose backbone for both image classification and dense recognition tasks. Unlike previous vision Transformers that produce feature maps of a single low resolution and have quadratic computation complexity, Swin Transformer is more efficient.
Training
The model is trained on the ImageNet-1k dataset, which consists of 1,000 classes. The architecture supports efficient computation through a hierarchical design and shifted windows technique, enabling it to handle high-resolution images effectively.
Guide: Running Locally
To use this model for image classification, follow the steps below:
-
Install Dependencies: Ensure you have the
transformers
,PIL
, andrequests
libraries installed. -
Load the Model and Processor:
from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") model = AutoModelForImageClassification.from_pretrained("microsoft/swin-tiny-patch4-window7-224") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_idx])
-
Cloud GPUs: For large-scale tasks or faster processing, consider using cloud GPU services like AWS EC2, Google Cloud, or Microsoft Azure.
License
The model is provided under the Apache-2.0 License. This allows for both personal and commercial use, modification, and distribution, with proper attribution to the original authors.