distilcamembert base sentiment

cmarkea

Introduction

DistilCamemBERT-Sentiment is a fine-tuned version of DistilCamemBERT for sentiment analysis in the French language. It uses Amazon Reviews and Allociné datasets to minimize bias and improve performance in text classification tasks.

Architecture

DistilCamemBERT-Sentiment is based on the DistilCamemBERT architecture, which is a distilled version of CamemBERT. This approach reduces inference time by half while maintaining similar power consumption. The model offers efficient scaling capabilities for production use.

Training

The model is trained on a dataset comprising 204,993 reviews from Amazon and 235,516 critiques from Allociné. The dataset is categorized into five sentiment labels ranging from 1 star (terrible) to 5 stars (excellent). The evaluation results report an exact accuracy of 61.01% and a top-2 accuracy of 88.80%.

Guide: Running Locally

  1. Install Transformers: Ensure you have the transformers library installed.

    pip install transformers
    
  2. Load the Model:

    from transformers import pipeline
    
    analyzer = pipeline(
        task='text-classification',
        model="cmarkea/distilcamembert-base-sentiment",
        tokenizer="cmarkea/distilcamembert-base-sentiment"
    )
    
  3. Run Inference:

    result = analyzer(
        "J'aime me promener en forêt même si ça me donne mal aux pieds.",
        return_all_scores=True
    )
    print(result)
    
  4. Using ONNX for Optimization:

    from optimum.onnxruntime import ORTModelForSequenceClassification
    from transformers import AutoTokenizer, pipeline
    
    tokenizer = AutoTokenizer.from_pretrained("cmarkea/distilcamembert-base-sentiment")
    model = ORTModelForSequenceClassification.from_pretrained("cmarkea/distilcamembert-base-sentiment")
    onnx_qa = pipeline("text-classification", model=model, tokenizer=tokenizer)
    
  5. Cloud GPUs: For faster processing, consider using cloud GPU services like AWS, GCP, or Azure.

License

The model is licensed under the MIT License, allowing for broad reuse with attribution.

More Related APIs in Text Classification