distilbert base uncased go emotions student
joeddavIntroduction
The DistilBERT-Base-Uncased-Go-Emotions-Student model is a distilled version of a zero-shot classification pipeline, specifically trained on the unlabeled GoEmotions dataset. This model demonstrates how a resource-intensive NLI-based zero-shot model can be distilled into a more efficient student model using only unlabeled data.
Architecture
The model is based on the DistilBERT architecture, which is a lightweight version of BERT. It is designed to perform text classification tasks and supports both PyTorch and TensorFlow frameworks.
Training
This model was trained using a script available in the Hugging Face Transformers repository. Training was conducted with mixed precision over 10 epochs, employing default script arguments. The teacher model generated pseudo-labels using single-label classification, despite the GoEmotions dataset allowing multiple labels per instance. This approach allows the model to be trained without requiring labeled data.
Guide: Running Locally
- Setup Environment: Ensure you have Python and a package manager like pip or conda installed. Create a virtual environment.
- Install Dependencies: Use the package manager to install PyTorch or TensorFlow, along with Hugging Face Transformers library.
pip install torch transformers # For PyTorch # or pip install tensorflow transformers # For TensorFlow
- Download Model: Use the Transformers library to load the model locally.
from transformers import AutoModelForSequenceClassification, AutoTokenizer model_name = "joeddav/distilbert-base-uncased-go-emotions-student" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)
- Run Inference: Tokenize your input text and run it through the model to get predictions.
inputs = tokenizer("I feel lucky to be here.", return_tensors="pt") outputs = model(**inputs)
- Consider Cloud GPUs: For faster processing, especially with large datasets, consider using cloud GPU services such as AWS EC2, Google Cloud, or Azure.
License
The model is available under the MIT License, allowing for wide use and adaptation.