Confli B E R T scr uncased
snowood1Introduction
ConfliBERT is a pre-trained language model specifically designed for analyzing political conflict and violence. It features four distinct versions, each with different pretraining methodologies and vocabulary considerations.
Architecture
ConfliBERT is built on the BERT architecture and supports various configurations, including uncased and cased vocabularies. The model versions are:
- ConfliBERT-scr-uncased: Trained from scratch with a custom uncased vocabulary.
- ConfliBERT-scr-cased: Trained from scratch with a custom cased vocabulary.
- ConfliBERT-cont-uncased: Continually pre-trained using BERT's original uncased vocabulary.
- ConfliBERT-cont-cased: Continually pre-trained using BERT's original cased vocabulary.
Training
ConfliBERT's training involves either pretraining from scratch or continual pretraining. The scratch versions utilize a custom vocabulary, while the continual versions build upon the original BERT vocabulary. More detailed training information can be found on the GitHub repository.
Guide: Running Locally
To run ConfliBERT locally, follow these steps:
- Clone the repository from GitHub.
- Install dependencies using
pip install -r requirements.txt
. - Download the desired ConfliBERT model version from Hugging Face.
- Load the model in your Python script using the Transformers library.
For optimal performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.
License
ConfliBERT is licensed under the GPL-3.0 license, which allows for redistribution and modification under the same license.