roberta base stocktwits finetuned
zhayunduoIntroduction
The roberta-base-stocktwits-finetuned
model is a sentiment inferencing model designed to classify stock-related comments as either 'Bullish' or 'Bearish'. This model was developed by NUS ISS students Frank Cao, Gerong Zhang, Jiaqi Yao, Sikai Ni, and Yunduo Zhang and is fine-tuned on a RoBERTa base model using comments from Stocktwits.
Architecture
The model is based on the RoBERTa architecture and utilizes the Transformers library in PyTorch. It is specifically fine-tuned to handle English language comments related to finance, using a dataset of 3.2 million user-labeled comments from Stocktwits.
Training
- Batch Size: 32
- Learning Rate: 2e-5
Training Results:
- Epoch 1: Train Loss: 0.3495, Validation Loss: 0.2956, Validation Accuracy: 0.8679
- Epoch 2: Train Loss: 0.2717, Validation Loss: 0.2235, Validation Accuracy: 0.9021
- Epoch 3: Train Loss: 0.2360, Validation Loss: 0.1875, Validation Accuracy: 0.9210
- Epoch 4: Train Loss: 0.2106, Validation Loss: 0.1603, Validation Accuracy: 0.9343
Guide: Running Locally
To run the model locally, follow these steps:
- Install Dependencies: Ensure you have the Transformers library installed. You can install it using pip:
pip install transformers
- Load the Model and Tokenizer:
from transformers import RobertaForSequenceClassification, RobertaTokenizer, pipeline tokenizer_loaded = RobertaTokenizer.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned') model_loaded = RobertaForSequenceClassification.from_pretrained('zhayunduo/roberta-base-stocktwits-finetuned')
- Preprocess Input Text:
import emoji import re def process_text(texts): texts = re.sub(r'https?://\S+', "", texts) texts = re.sub(r'www.\S+', "", texts) texts = texts.replace(''', "'") texts = re.sub(r'(\#)(\S+)', r'hashtag_\2', texts) texts = re.sub(r'(\$)([A-Za-z]+)', r'cashtag_\2', texts) texts = re.sub(r'(\@)(\S+)', r'mention_\2', texts) texts = emoji.demojize(texts, delimiters=("", " ")) return texts.strip()
- Run Sentiment Analysis:
nlp = pipeline("text-classification", model=model_loaded, tokenizer=tokenizer_loaded) sentences = ['just buy', 'just sell it', 'entity rocket to the sky!', 'go down', 'even though it is going up, I still think it will not keep this trend in the near future'] results = nlp(sentences) print(results)
Consider using cloud GPUs like AWS EC2, Google Cloud, or Azure to handle larger datasets and improve processing speed.
License
This model is licensed under the Apache-2.0 License.