CAMEMBERT-FR-COVID-TWEET-SENTIMENT-CLASSIFICATION

Introduction

This model is a fine-tuned checkpoint of Yanzhu/bertweetfr-base, aimed at classifying the sentiment of tweets related to COVID-19 in French. The model classifies tweets into three sentiment categories: negative, neutral, and positive, achieving an accuracy of 71% on the development set.

Architecture

The model is built using the CamemBERT architecture, a variant of the BERT model optimized for the French language. It leverages the PyTorch library and is compatible with inference endpoints. The model is designed for text classification tasks.

Training

The model was fine-tuned on the SST-2 dataset, leveraging the pre-trained bertweetfr-base model. The training process involved classifying tweets into one of three sentiment categories: 0 (negative), 1 (neutral), and 2 (positive).

Guide: Running Locally

To run the model locally, follow these steps:

Install the Transformers library:
```
pip install transformers
```

Import the necessary classes and load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("data354/camembert-fr-covid-tweet-sentiment-classification")
model = AutoModelForSequenceClassification.from_pretrained("data354/camembert-fr-covid-tweet-sentiment-classification")

Create a pipeline for sentiment classification:

nlp_topic_classif = pipeline('text-classification', model=model, tokenizer=tokenizer)

Use the pipeline to classify a tweet:

result = nlp_topic_classif("tchai on est morts. on va se faire vacciner et ils vont contrôler comme les marionnettes avec des fils. d'après les '' ont dit ''...")
print(result)

For optimal performance, consider using cloud GPUs such as AWS SageMaker, Google Cloud AI Platform, or Azure Machine Learning.

License

This model is released under the Apache 2.0 license.