roberta base stocktwits finetuned

zhayunduo

Introduction

The roberta-base-stocktwits-finetuned model is a sentiment inferencing model designed to classify stock-related comments as either 'Bullish' or 'Bearish'. This model was developed by NUS ISS students Frank Cao, Gerong Zhang, Jiaqi Yao, Sikai Ni, and Yunduo Zhang and is fine-tuned on a RoBERTa base model using comments from Stocktwits.

Architecture

The model is based on the RoBERTa architecture and utilizes the Transformers library in PyTorch. It is specifically fine-tuned to handle English language comments related to finance, using a dataset of 3.2 million user-labeled comments from Stocktwits.

Training

  • Batch Size: 32
  • Learning Rate: 2e-5

Training Results:

  • Epoch 1: Train Loss: 0.3495, Validation Loss: 0.2956, Validation Accuracy: 0.8679
  • Epoch 2: Train Loss: 0.2717, Validation Loss: 0.2235, Validation Accuracy: 0.9021
  • Epoch 3: Train Loss: 0.2360, Validation Loss: 0.1875, Validation Accuracy: 0.9210
  • Epoch 4: Train Loss: 0.2106, Validation Loss: 0.1603, Validation Accuracy: 0.9343

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have the Transformers library installed. You can install it using pip:
    pip install transformers
    
  2. Load the Model and Tokenizer:
    from transformers import RobertaForSequenceClassification, RobertaTokenizer, pipeline
    
    tokenizer_loaded = RobertaTokenizer.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned')
    model_loaded = RobertaForSequenceClassification.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned')
    
  3. Preprocess Input Text:
    import emoji
    import re
    
    def process_text(texts):
        texts = re.sub(r'https?://\S+', "", texts)
        texts = re.sub(r'www.\S+', "", texts)
        texts = texts.replace(''', "'")
        texts = re.sub(r'(\#)(\S+)', r'hashtag_\2', texts)
        texts = re.sub(r'(\$)([A-Za-z]+)', r'cashtag_\2', texts)
        texts = re.sub(r'(\@)(\S+)', r'mention_\2', texts)
        texts = emoji.demojize(texts, delimiters=("", " "))
        return texts.strip()
    
  4. Run Sentiment Analysis:
    nlp = pipeline("text-classification", model=model_loaded, tokenizer=tokenizer_loaded)
    sentences = ['just buy', 'just sell it', 'entity rocket to the sky!', 'go down',
                 'even though it is going up, I still think it will not keep this trend in the near future']
    results = nlp(sentences)
    print(results)
    

Consider using cloud GPUs like AWS EC2, Google Cloud, or Azure to handle larger datasets and improve processing speed.

License

This model is licensed under the Apache-2.0 License.

More Related APIs in Text Classification