vit base beans

nateraw

Introduction

VIT-BASE-BEANS is a fine-tuned version of Google's vit-base-patch16-224-in21k model, tailored for image classification on the beans dataset. The model achieves a notable accuracy of 97.74% on the evaluation set.

Architecture

The model is built upon the Vision Transformer (ViT) architecture, which is designed for image classification tasks. It leverages the transformer model's capabilities to process image patches as sequences, allowing it to capture spatial relationships effectively.

Training

Training Procedure

The model was trained using the Adam optimizer with specific hyperparameters, including:

  • Learning Rate: 2e-05
  • Train Batch Size: 8
  • Eval Batch Size: 8
  • Seed: 1337
  • Optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • Learning Rate Scheduler: Linear
  • Number of Epochs: 5.0

Training Results

  • Training Loss: Decreased from 0.2809 in the first epoch to 0.0923 in the fourth epoch.
  • Validation Loss: Fluctuated slightly, reaching 0.0942 in the third epoch, with an accuracy of 97.74%.
  • Evaluation Metrics: The model achieved a loss of 0.0942 and an accuracy of 97.74% on the evaluation set.

Guide: Running Locally

To run the VIT-BASE-BEANS model locally, follow these steps:

  1. Install Dependencies: Ensure you have the necessary Python libraries installed, including PyTorch and Transformers. Use the following command:

    pip install torch transformers
    
  2. Clone the Repository: Clone the model repository from Hugging Face:

    git clone https://huggingface.co/nateraw/vit-base-beans
    
  3. Load the Model: Use the Transformers library to load the model and tokenizer:

    from transformers import ViTForImageClassification, ViTFeatureExtractor
    model = ViTForImageClassification.from_pretrained('nateraw/vit-base-beans')
    feature_extractor = ViTFeatureExtractor.from_pretrained('nateraw/vit-base-beans')
    
  4. Inference: Load an image and perform inference:

    from PIL import Image
    import requests
    
    url = "https://example.com/path/to/image.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    inputs = feature_extractor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits
    
  5. Evaluate on Cloud: For efficient computation, consider using cloud GPUs such as AWS EC2 P3 instances or Google Cloud's AI Platform.

License

The VIT-BASE-BEANS model is licensed under the Apache 2.0 License, allowing for both personal and commercial use with appropriate attribution.

More Related APIs in Image Classification