swinv2 tiny patch4 window8 256
microsoftIntroduction
The Swin Transformer V2 (Tiny-sized model) is a pre-trained model designed for image classification tasks, using the ImageNet-1k dataset at a resolution of 256x256. Introduced in the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" by Liu et al., this model offers a general-purpose backbone suitable for both image classification and dense recognition tasks.
Architecture
Swin Transformer V2 constructs hierarchical feature maps by merging image patches in deeper layers. It computes self-attention only within each local window, leading to linear computational complexity relative to the input image size. The model introduces three key improvements: residual-post-norm with cosine attention for training stability, log-spaced continuous position bias for effective model transfer to high-resolution tasks, and a self-supervised pre-training method called SimMIM that reduces the need for large labeled datasets.
Training
The Swin Transformer V2 utilizes a self-supervised pre-training method, SimMIM, allowing it to perform well with fewer labeled images during training. The model is pre-trained on low-resolution images and can be effectively transferred to downstream tasks involving high-resolution inputs.
Guide: Running Locally
To use the Swin Transformer V2 model for image classification on a local machine, follow these steps:
-
Install the
transformers
library:pip install transformers
-
Use the provided Python code to classify an image:
from transformers import AutoImageProcessor, AutoModelForImageClassification from PIL import Image import requests url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) processor = AutoImageProcessor.from_pretrained("microsoft/swinv2-tiny-patch4-window8-256") model = AutoModelForImageClassification.from_pretrained("microsoft/swinv2-tiny-patch4-window8-256") inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits predicted_class_idx = logits.argmax(-1).item() print("Predicted class:", model.config.id2label[predicted_class_idx])
-
For optimal performance, consider using cloud GPUs available through platforms like AWS, Google Cloud, or Azure.
License
This model is released under the Apache 2.0 license, which allows for both personal and commercial use with proper attribution.