vit face expression
trpakovIntroduction
The vit-face-expression
model is a Vision Transformer (ViT) fine-tuned to perform facial emotion recognition. It uses the FER2013 dataset to identify emotions from facial images, categorizing them into seven distinct emotions: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
Architecture
The model is based on the Vision Transformer architecture, specifically fine-tuned from the vit-base-patch16-224-in21k
model. ViT is renowned for its performance in image classification tasks by leveraging transformer-based architectures traditionally used in natural language processing.
Training
Training of the vit-face-expression
model involves several steps:
- Dataset: The model is trained on the FER2013 dataset, which provides a comprehensive collection of facial images labeled with emotional categories.
- Preprocessing: Input images undergo preprocessing, including resizing to a consistent input size, normalization of pixel values, and data augmentation through random transformations.
- Evaluation: The model achieves a validation accuracy of 71.13% and a test accuracy of 71.16%. Limitations include potential data bias and challenges in generalizing to diverse unseen data.
Guide: Running Locally
To run the vit-face-expression
model locally, follow these steps:
- Clone the Repository: Obtain the model from its repository on Hugging Face.
- Set Up Environment: Ensure that your environment includes Python and the necessary libraries, such as PyTorch and Hugging Face Transformers.
- Download Model Weights: Load the pre-trained model weights from Hugging Face.
- Run Inference: Use the model to perform emotion recognition on your own images.
For optimal performance, especially during training or large-scale inference, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The model and code are available under the Apache License 2.0, which allows for both personal and commercial use, modification, and distribution. Ensure compliance with the license terms when using the model in your projects.