stanford car vit patch16
therealcyberlordIntroduction
The Stanford-Car-ViT-Patch16 model is a fine-tuned version of the ViT base model specifically trained on the Stanford Car dataset. It achieves approximately 86% accuracy on the testing set, serving as a solid baseline for further tuning.
Architecture
The model is based on the Vision Transformer (ViT) architecture, specifically the vit-base-patch16-224
model. This transformer-based architecture is particularly suited for image classification tasks.
Training
The model is trained on the Stanford Car dataset, which contains 16,185 images across 196 car classes. These classes are detailed at the Make, Model, and Year level. The dataset is divided into 8,144 training images, 6,041 testing images, and 2,000 validation images. Note that newer car models are not included in this dataset.
Guide: Running Locally
To use the model locally, follow these steps:
-
Install the
transformers
library if not already installed:pip install transformers
-
Use the following Python code to load the model and feature extractor:
from transformers import AutoFeatureExtractor, AutoModelForImageClassification extractor = AutoFeatureExtractor.from_pretrained("therealcyberlord/stanford-car-vit-patch16") model = AutoModelForImageClassification.from_pretrained("therealcyberlord/stanford-car-vit-patch16")
-
For more efficient processing, consider using cloud GPUs from platforms like AWS, Google Cloud, or Azure.
License
The model is available under the Apache 2.0 license, which allows for both personal and commercial use with appropriate credit to the original authors.