distilroberta base climate specificity

climatebert

ClimateBERT: DistilRoBERTa-Base-Climate-Specificity

Introduction

The DistilRoBERTa-Base-Climate-Specificity model is a fine-tuned language model developed by ClimateBERT for classifying climate-related paragraphs into specific and non-specific categories. It is based on the DistilRoBERTa architecture and trained using the climate-specific dataset provided by ClimateBERT. The model is primarily intended for use with paragraphs rather than individual sentences.

Architecture

The model builds on the DistilRoBERTa architecture, which is a smaller, faster, and lighter version of the RoBERTa model. It includes a classification head specifically tuned for the task of identifying specificity within climate-related texts.

Training

The model was fine-tuned on the climatebert/climate_specificity dataset. This dataset is designed to capture the nuance of specificity in climate discourse. Although the model is optimized for paragraphs, it may not perform as effectively on sentences.

Guide: Running Locally

To use this model for text classification, follow these steps:

  1. Install required libraries:

    pip install transformers datasets
    
  2. Load the model and tokenizer:

    from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
    import datasets
    
    model_name = "climatebert/distilroberta-base-climate-specificity"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name, max_length=512)
    
  3. Load the dataset:

    dataset_name = "climatebert/climate_specificity"
    dataset = datasets.load_dataset(dataset_name, split="test")
    
  4. Create a pipeline for text classification:

    pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0)
    
  5. Run the pipeline on the dataset:

    from transformers.pipelines.pt_utils import KeyDataset
    from tqdm.auto import tqdm
    
    for out in tqdm(pipe(KeyDataset(dataset, "text"), padding=True, truncation=True)):
        print(out)
    

Suggestion for Cloud GPUs

To enhance performance, especially with large datasets or when running multiple inferences, consider using cloud-based GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The DistilRoBERTa-Base-Climate-Specificity model is released under the Apache 2.0 License. This permits use, modification, and distribution of the model under specified conditions.

More Related APIs in Text Classification