distilroberta base climate detector

climatebert

Introduction

The DistilRoBERTa-Base-Climate-Detector is a fine-tuned version of the ClimateBERT language model, designed specifically for detecting climate-related content in paragraphs. This model is built upon the climatebert/distilroberta-base-climate-f and is fine-tuned using the climatebert/climate_detection dataset.

Architecture

The model is based on the DistilRoBERTa architecture, which is a distilled version of the RoBERTa model. It has been tailored for climate-related text classification tasks, making it efficient in detecting climate-related paragraphs.

Training

The DistilRoBERTa-Base-Climate-Detector was fine-tuned using the climatebert/climate_detection dataset. It is important to note that the model was trained on paragraph-length text, which may impact its performance on shorter text segments like sentences.

Guide: Running Locally

  1. Installation: Ensure you have the transformers and datasets libraries installed.
  2. Load the Model and Tokenizer:
    from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
    model_name = "climatebert/distilroberta-base-climate-detector"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
    
  3. Load the Dataset:
    import datasets
    dataset_name = "climatebert/climate_detection"
    dataset = datasets.load_dataset(dataset_name, split="test")
    
  4. Run the Inference Pipeline:
    from transformers.pipelines.pt_utils import KeyDataset
    from tqdm.auto import tqdm
    pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0)
    for out in tqdm(pipe(KeyDataset(dataset, "text"), padding=True, truncation=True)):
        print(out)
    
  5. Hardware: For optimal performance, consider utilizing cloud GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The DistilRoBERTa-Base-Climate-Detector is released under the Apache 2.0 license, allowing for both personal and commercial use.

More Related APIs in Text Classification