distilroberta base climate specificity
climatebertClimateBERT: DistilRoBERTa-Base-Climate-Specificity
Introduction
The DistilRoBERTa-Base-Climate-Specificity model is a fine-tuned language model developed by ClimateBERT for classifying climate-related paragraphs into specific and non-specific categories. It is based on the DistilRoBERTa architecture and trained using the climate-specific dataset provided by ClimateBERT. The model is primarily intended for use with paragraphs rather than individual sentences.
Architecture
The model builds on the DistilRoBERTa architecture, which is a smaller, faster, and lighter version of the RoBERTa model. It includes a classification head specifically tuned for the task of identifying specificity within climate-related texts.
Training
The model was fine-tuned on the climatebert/climate_specificity
dataset. This dataset is designed to capture the nuance of specificity in climate discourse. Although the model is optimized for paragraphs, it may not perform as effectively on sentences.
Guide: Running Locally
To use this model for text classification, follow these steps:
-
Install required libraries:
pip install transformers datasets
-
Load the model and tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline import datasets model_name = "climatebert/distilroberta-base-climate-specificity" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, max_length=512)
-
Load the dataset:
dataset_name = "climatebert/climate_specificity" dataset = datasets.load_dataset(dataset_name, split="test")
-
Create a pipeline for text classification:
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0)
-
Run the pipeline on the dataset:
from transformers.pipelines.pt_utils import KeyDataset from tqdm.auto import tqdm for out in tqdm(pipe(KeyDataset(dataset, "text"), padding=True, truncation=True)): print(out)
Suggestion for Cloud GPUs
To enhance performance, especially with large datasets or when running multiple inferences, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The DistilRoBERTa-Base-Climate-Specificity model is released under the Apache 2.0 License. This permits use, modification, and distribution of the model under specified conditions.