autotrain_fashion_mnist_vit_base
abhishekIntroduction
The autotrain_fashion_mnist_vit_base
model is designed for image classification tasks, specifically targeting the Fashion MNIST dataset. The model uses Vision Transformer (ViT) architecture and was developed using the AutoTrain platform, a tool that simplifies the process of training machine learning models.
Architecture
The model leverages the Vision Transformer (ViT) architecture, which is well-suited for image classification tasks. This architecture transforms images into sequences of patches, which are then processed similarly to tokens in NLP models. The model is integrated with the PyTorch library, enabling efficient training and inference.
Training
Training was conducted using the Fashion MNIST dataset, focusing on multi-class classification. The AutoTrain platform facilitated the training process, optimizing the model for high accuracy. Key metrics achieved include:
- Accuracy: 94.73%
- Macro F1 Score: 94.73%
- Loss: 0.1678
- CO2 Emissions: 0.2439 grams
These metrics indicate a robust model performance with low environmental impact.
Guide: Running Locally
To run this model locally, follow these steps:
- Set up your environment: Ensure you have Python and PyTorch installed. Use a virtual environment for better management.
- Clone the repository: Download the model files from the Hugging Face repository.
- Install dependencies: Run
pip install -r requirements.txt
if a requirements file is provided. - Load the model: Use the Hugging Face Transformers library to load the model with PyTorch.
- Run inference: Prepare your input data, and use the model to make predictions.
For efficient processing, it is advisable to use cloud GPUs such as AWS EC2 instances or Google Cloud GPUs.
License
The model and its associated files are released under the Apache 2.0 License, allowing for broad use and distribution, provided that proper credit is given and any modifications are documented.