Introduction

The model, NATERAW/FOOD, is a fine-tuned version of google/vit-base-patch16-224-in21k specifically trained on the food101 dataset. It is designed for image classification tasks, and achieves a validation accuracy of 0.8913 with a loss of 0.4501.

Architecture

This model utilizes a Vision Transformer (ViT) architecture, which is particularly effective for image classification tasks. It leverages the pre-trained base google/vit-base-patch16-224-in21k model, adapted to the specifics of the food101 dataset.

Training

The model was trained using the following hyperparameters:

  • Learning Rate: 0.0002
  • Train Batch Size: 128
  • Eval Batch Size: 128
  • Seed: 1337
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • LR Scheduler Type: Linear
  • Number of Epochs: 5.0
  • Mixed Precision Training: Native AMP

The training was performed using the following framework versions:

  • Transformers: 4.9.0.dev0
  • PyTorch: 1.9.0+cu102
  • Datasets: 1.9.1.dev0
  • Tokenizers: 0.10.3

Guide: Running Locally

To run this model locally, follow these steps:

  1. Clone the Repository:

    git clone https://huggingface.co/nateraw/food
    cd food
    
  2. Install Dependencies: Ensure you have the necessary Python packages installed:

    pip install transformers torch datasets
    
  3. Load and Run the Model: Utilize the following Python script to load and test the model:

    from transformers import ViTForImageClassification, ViTFeatureExtractor
    from PIL import Image
    import requests
    
    url = "URL_TO_IMAGE"
    image = Image.open(requests.get(url, stream=True).raw)
    
    feature_extractor = ViTFeatureExtractor.from_pretrained("nateraw/food")
    model = ViTForImageClassification.from_pretrained("nateraw/food")
    
    inputs = feature_extractor(images=image, return_tensors="pt")
    outputs = model(**inputs)
    logits = outputs.logits
    
    predicted_class = logits.argmax(-1).item()
    print("Predicted class:", predicted_class)
    
  4. Consider Using Cloud GPUs: For optimal performance, especially during training or large-scale inference, consider using cloud-based GPUs like AWS EC2, Google Cloud Platform, or Azure.

License

This model is licensed under the Apache 2.0 License.

More Related APIs in Image Classification