cryptobert LLM Model — Open LLM List

Introduction

CryptoBERT is a pre-trained natural language processing (NLP) model designed for analyzing the language and sentiments of social media posts related to cryptocurrencies. It builds upon the vinai's BERTweet-base model and is specifically fine-tuned using a large corpus of cryptocurrency-related social media posts.

Architecture

CryptoBERT is based on the BERT architecture, specifically optimized for sentiment classification in the cryptocurrency domain. It can handle sequences up to 514 tokens, although a maximum sequence length of 128 is recommended for optimal performance.

Training

The model was fine-tuned on a balanced dataset of 2 million labeled StockTwits posts, categorized into "Bearish" (0), "Neutral" (1), and "Bullish" (2) sentiments. The training corpus consisted of 3.2 million unique posts above four words in length, sourced from platforms like StockTwits, Telegram, Reddit, and Twitter.

Guide: Running Locally

Dependencies: Ensure you have Python and the transformers library installed.

Model Setup: Load the CryptoBERT model and tokenizer:

from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer
model_name = "ElKulako/cryptobert"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

Pipeline Creation: Set up the text classification pipeline:

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, max_length=64, truncation=True, padding='max_length')

Inference Example: Analyze sentiment for a list of posts:

df_posts = ["post_1 content", "post_2 content", "post_3 content"]
preds = pipe(df_posts)
print(preds)

Cloud GPU Suggestion: For efficient processing, consider using cloud services with GPU support, such as AWS EC2 with NVIDIA GPUs or Google Cloud's AI Platform.

License

CryptoBERT is released under the MIT License, allowing for flexible use, modification, and distribution.

More Related APIs in Text Classification