camembert fr covid tweet sentiment classification
data354CAMEMBERT-FR-COVID-TWEET-SENTIMENT-CLASSIFICATION
Introduction
This model is a fine-tuned checkpoint of Yanzhu/bertweetfr-base
, aimed at classifying the sentiment of tweets related to COVID-19 in French. The model classifies tweets into three sentiment categories: negative, neutral, and positive, achieving an accuracy of 71% on the development set.
Architecture
The model is built using the CamemBERT architecture, a variant of the BERT model optimized for the French language. It leverages the PyTorch library and is compatible with inference endpoints. The model is designed for text classification tasks.
Training
The model was fine-tuned on the SST-2 dataset, leveraging the pre-trained bertweetfr-base
model. The training process involved classifying tweets into one of three sentiment categories: 0 (negative), 1 (neutral), and 2 (positive).
Guide: Running Locally
To run the model locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Import the necessary classes and load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("data354/camembert-fr-covid-tweet-sentiment-classification") model = AutoModelForSequenceClassification.from_pretrained("data354/camembert-fr-covid-tweet-sentiment-classification")
-
Create a pipeline for sentiment classification:
nlp_topic_classif = pipeline('text-classification', model=model, tokenizer=tokenizer)
-
Use the pipeline to classify a tweet:
result = nlp_topic_classif("tchai on est morts. on va se faire vacciner et ils vont contrôler comme les marionnettes avec des fils. d'après les '' ont dit ''...") print(result)
For optimal performance, consider using cloud GPUs such as AWS SageMaker, Google Cloud AI Platform, or Azure Machine Learning.
License
This model is released under the Apache 2.0 license.