vit base beans
naterawIntroduction
VIT-BASE-BEANS is a fine-tuned version of Google's vit-base-patch16-224-in21k
model, tailored for image classification on the beans dataset. The model achieves a notable accuracy of 97.74% on the evaluation set.
Architecture
The model is built upon the Vision Transformer (ViT) architecture, which is designed for image classification tasks. It leverages the transformer model's capabilities to process image patches as sequences, allowing it to capture spatial relationships effectively.
Training
Training Procedure
The model was trained using the Adam optimizer with specific hyperparameters, including:
- Learning Rate: 2e-05
- Train Batch Size: 8
- Eval Batch Size: 8
- Seed: 1337
- Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- Learning Rate Scheduler: Linear
- Number of Epochs: 5.0
Training Results
- Training Loss: Decreased from 0.2809 in the first epoch to 0.0923 in the fourth epoch.
- Validation Loss: Fluctuated slightly, reaching 0.0942 in the third epoch, with an accuracy of 97.74%.
- Evaluation Metrics: The model achieved a loss of 0.0942 and an accuracy of 97.74% on the evaluation set.
Guide: Running Locally
To run the VIT-BASE-BEANS model locally, follow these steps:
-
Install Dependencies: Ensure you have the necessary Python libraries installed, including PyTorch and Transformers. Use the following command:
pip install torch transformers
-
Clone the Repository: Clone the model repository from Hugging Face:
git clone https://huggingface.co/nateraw/vit-base-beans
-
Load the Model: Use the Transformers library to load the model and tokenizer:
from transformers import ViTForImageClassification, ViTFeatureExtractor model = ViTForImageClassification.from_pretrained('nateraw/vit-base-beans') feature_extractor = ViTFeatureExtractor.from_pretrained('nateraw/vit-base-beans')
-
Inference: Load an image and perform inference:
from PIL import Image import requests url = "https://example.com/path/to/image.jpg" image = Image.open(requests.get(url, stream=True).raw) inputs = feature_extractor(images=image, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits
-
Evaluate on Cloud: For efficient computation, consider using cloud GPUs such as AWS EC2 P3 instances or Google Cloud's AI Platform.
License
The VIT-BASE-BEANS model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.