distilcamembert base sentiment
cmarkeaIntroduction
DistilCamemBERT-Sentiment is a fine-tuned version of DistilCamemBERT for sentiment analysis in the French language. It uses Amazon Reviews and Allociné datasets to minimize bias and improve performance in text classification tasks.
Architecture
DistilCamemBERT-Sentiment is based on the DistilCamemBERT architecture, which is a distilled version of CamemBERT. This approach reduces inference time by half while maintaining similar power consumption. The model offers efficient scaling capabilities for production use.
Training
The model is trained on a dataset comprising 204,993 reviews from Amazon and 235,516 critiques from Allociné. The dataset is categorized into five sentiment labels ranging from 1 star (terrible) to 5 stars (excellent). The evaluation results report an exact accuracy of 61.01% and a top-2 accuracy of 88.80%.
Guide: Running Locally
-
Install Transformers: Ensure you have the
transformers
library installed.pip install transformers
-
Load the Model:
from transformers import pipeline analyzer = pipeline( task='text-classification', model="cmarkea/distilcamembert-base-sentiment", tokenizer="cmarkea/distilcamembert-base-sentiment" )
-
Run Inference:
result = analyzer( "J'aime me promener en forêt même si ça me donne mal aux pieds.", return_all_scores=True ) print(result)
-
Using ONNX for Optimization:
from optimum.onnxruntime import ORTModelForSequenceClassification from transformers import AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("cmarkea/distilcamembert-base-sentiment") model = ORTModelForSequenceClassification.from_pretrained("cmarkea/distilcamembert-base-sentiment") onnx_qa = pipeline("text-classification", model=model, tokenizer=tokenizer)
-
Cloud GPUs: For faster processing, consider using cloud GPU services like AWS, GCP, or Azure.
License
The model is licensed under the MIT License, allowing for broad reuse with attribution.