distilroberta base climate detector
climatebertIntroduction
The DistilRoBERTa-Base-Climate-Detector is a fine-tuned version of the ClimateBERT language model, designed specifically for detecting climate-related content in paragraphs. This model is built upon the climatebert/distilroberta-base-climate-f and is fine-tuned using the climatebert/climate_detection dataset.
Architecture
The model is based on the DistilRoBERTa architecture, which is a distilled version of the RoBERTa model. It has been tailored for climate-related text classification tasks, making it efficient in detecting climate-related paragraphs.
Training
The DistilRoBERTa-Base-Climate-Detector was fine-tuned using the climatebert/climate_detection dataset. It is important to note that the model was trained on paragraph-length text, which may impact its performance on shorter text segments like sentences.
Guide: Running Locally
- Installation: Ensure you have the
transformers
anddatasets
libraries installed. - Load the Model and Tokenizer:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline model_name = "climatebert/distilroberta-base-climate-detector" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
- Load the Dataset:
import datasets dataset_name = "climatebert/climate_detection" dataset = datasets.load_dataset(dataset_name, split="test")
- Run the Inference Pipeline:
from transformers.pipelines.pt_utils import KeyDataset from tqdm.auto import tqdm pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0) for out in tqdm(pipe(KeyDataset(dataset, "text"), padding=True, truncation=True)): print(out)
- Hardware: For optimal performance, consider utilizing cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
The DistilRoBERTa-Base-Climate-Detector is released under the Apache 2.0 license, allowing for both personal and commercial use.