Introduction

GUWEN-SENT is a model designed for sentiment classification of classical Chinese poetry. It leverages the capabilities of transformer architectures, specifically a variant of RoBERTa and BERT, to analyze and classify the sentiments expressed in ancient literary Chinese texts.

Architecture

The model is based on the RoBERTa transformer architecture and is optimized for processing and understanding the nuances of classical Chinese language. It is built using PyTorch, making it suitable for a variety of applications, particularly in sentiment analysis of ancient texts. The model is specialized in text classification tasks, focusing on the sentiment expressed in classical Chinese poetry.

Training

The model has been trained on a dataset composed of classical Chinese texts, enabling it to understand and classify different sentiments within this genre. The training process involved fine-tuning the RoBERTa architecture to adapt to the specific linguistic features of ancient Chinese.

Guide: Running Locally

  1. Clone the Repository:
    git clone https://github.com/ethan-yt/guwen-sent
    cd guwen-sent
    
  2. Install Dependencies: Ensure you have Python and PyTorch installed. Then, install the required packages:
    pip install -r requirements.txt
    
  3. Download the Model: Use the Hugging Face Transformers library to download the model:
    from transformers import AutoModelForSequenceClassification, AutoTokenizer
    
    model_name = "ethanyt/guwen-sent"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  4. Run Inference: Input your classical Chinese text to get sentiment classification:
    inputs = tokenizer("滚滚长江东逝水,浪花淘尽英雄", return_tensors="pt")
    outputs = model(**inputs)
    
  5. Suggest Cloud GPUs: For faster processing, especially with large datasets or longer texts, consider using cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

GUWEN-SENT is distributed under the Apache-2.0 license, allowing users significant freedom to use, modify, and distribute the software as long as they comply with the terms of the license.

More Related APIs in Text Classification