Confli B E R T cont uncased

snowood1

Introduction

ConfliBERT is a pre-trained language model specifically designed to understand political conflict and violence. It comes in four versions, each catering to different pretraining needs and vocabulary requirements.

Architecture

ConfliBERT builds on the BERT architecture, adapting it to focus on political conflict and violence. It involves two primary approaches to pretraining:

  • Pretraining from scratch with a custom vocabulary.
  • Continual pretraining using original BERT vocabulary.

Training

The model has four variations:

  1. ConfliBERT-scr-uncased: Pretrained from scratch with an uncased custom vocabulary. This is the preferred version for most applications.
  2. ConfliBERT-scr-cased: Pretrained from scratch with a cased custom vocabulary.
  3. ConfliBERT-cont-uncased: Continual pretraining using the original BERT's uncased vocabulary.
  4. ConfliBERT-cont-cased: Continual pretraining using the original BERT's cased vocabulary.

Guide: Running Locally

  1. Clone the Repository:
    Download the ConfliBERT repository from GitHub:

    git clone https://github.com/eventdata/ConfliBERT/
    cd ConfliBERT
    
  2. Install Dependencies:
    Make sure you have all necessary packages installed. You can use a virtual environment:

    python -m venv env
    source env/bin/activate
    pip install -r requirements.txt
    
  3. Run the Model:
    You can run the model using PyTorch and the Transformers library from Hugging Face.

  4. Cloud GPUs:
    For optimal performance, especially with larger datasets, consider using cloud-based GPU services like AWS EC2, Google Cloud Platform, or Azure.

License

ConfliBERT is licensed under the GNU General Public License v3.0 (GPL-3.0).

More Related APIs in Fill Mask