tf allocine
tblardIntroduction
TF-ALLOCINÉ is a French sentiment analysis model based on CamemBERT, fine-tuned using a dataset of user reviews from Allociné.fr. It achieves high accuracy and F1 scores on validation and test datasets.
Architecture
The model leverages the CamemBERT architecture, a variant of the BERT model tailored for the French language. It is designed for text classification tasks, specifically sentiment analysis in this instance.
Training
The model was fine-tuned on a large-scale dataset scraped from Allociné.fr, a popular French movie review site. The training process yielded high performance metrics with a validation accuracy of 97.39% and a test accuracy of 97.44%.
Guide: Running Locally
To run the TF-ALLOCINÉ model locally, follow these steps:
-
Install the Hugging Face Transformers library:
pip install transformers
-
Import necessary classes and load the pre-trained model and tokenizer:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification, pipeline tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine") model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
-
Create a sentiment analysis pipeline:
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
-
Use the pipeline to analyze text sentiment:
print(nlp("Alad'2 est clairement le meilleur film de l'année 2018.")) # POSITIVE print(nlp("NUL...A...CHIER ! FIN DE TRANSMISSION.")) # NEGATIVE
For optimal performance, it is recommended to use cloud GPUs, such as those available on AWS, Google Cloud, or Azure.
License
The model and associated code are available under the terms specified in the GitHub repository by Théophile Blard. Proper citation is requested if the work is used in any capacity.