Confli B E R T cont uncased
snowood1Introduction
ConfliBERT is a pre-trained language model specifically designed to understand political conflict and violence. It comes in four versions, each catering to different pretraining needs and vocabulary requirements.
Architecture
ConfliBERT builds on the BERT architecture, adapting it to focus on political conflict and violence. It involves two primary approaches to pretraining:
- Pretraining from scratch with a custom vocabulary.
- Continual pretraining using original BERT vocabulary.
Training
The model has four variations:
- ConfliBERT-scr-uncased: Pretrained from scratch with an uncased custom vocabulary. This is the preferred version for most applications.
- ConfliBERT-scr-cased: Pretrained from scratch with a cased custom vocabulary.
- ConfliBERT-cont-uncased: Continual pretraining using the original BERT's uncased vocabulary.
- ConfliBERT-cont-cased: Continual pretraining using the original BERT's cased vocabulary.
Guide: Running Locally
-
Clone the Repository:
Download the ConfliBERT repository from GitHub:git clone https://github.com/eventdata/ConfliBERT/ cd ConfliBERT
-
Install Dependencies:
Make sure you have all necessary packages installed. You can use a virtual environment:python -m venv env source env/bin/activate pip install -r requirements.txt
-
Run the Model:
You can run the model using PyTorch and the Transformers library from Hugging Face. -
Cloud GPUs:
For optimal performance, especially with larger datasets, consider using cloud-based GPU services like AWS EC2, Google Cloud Platform, or Azure.
License
ConfliBERT is licensed under the GNU General Public License v3.0 (GPL-3.0).