phobert base vietnamese sentiment

wonrax

Introduction

The PhoBERT-Base-Vietnamese-Sentiment model is fine-tuned for sentiment analysis on Vietnamese text. It is based on the vinai/phobert-base model and utilizes a dataset consisting of 30,000 e-commerce reviews. The model can classify text into three sentiment categories: Negative (NEG), Positive (POS), and Neutral (NEU).

Architecture

This model leverages the RoBERTa architecture, implemented using the PyTorch library, tailored for the Vietnamese language. It is designed for text classification tasks, specifically sentiment analysis.

Training

The PhoBERT-Base-Vietnamese-Sentiment model was fine-tuned using a dataset of 30,000 e-commerce reviews. The training process focused on optimizing the model's ability to categorize sentiments expressed in Vietnamese text.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure that you have PyTorch and the Transformers library installed.

    pip install torch transformers
    
  2. Load the Model: Use the code snippet below to load and run the model:

    import torch
    from transformers import RobertaForSequenceClassification, AutoTokenizer
    
    model = RobertaForSequenceClassification.from_pretrained("wonrax/phobert-base-vietnamese-sentiment")
    tokenizer = AutoTokenizer.from_pretrained("wonrax/phobert-base-vietnamese-sentiment", use_fast=False)
    
    # Word-segment the input text before using it
    sentence = 'Đây là mô_hình rất hay , phù_hợp với điều_kiện và như cầu của nhiều người .'
    
    input_ids = torch.tensor([tokenizer.encode(sentence)])
    
    with torch.no_grad():
        out = model(input_ids)
        print(out.logits.softmax(dim=-1).tolist())
    
  3. Cloud GPUs: For better performance, especially with large datasets or in production environments, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The PhoBERT-Base-Vietnamese-Sentiment model is released under the MIT License, allowing for flexibility in usage and modification.

More Related APIs in Text Classification