bert fa zwnj base
HooshvareLabIntroduction
ParsBERT is a monolingual language model specifically designed for Persian language understanding. It is based on Google's BERT architecture and has been pre-trained on an extensive range of Persian texts, covering diverse writing styles from various subjects such as scientific articles, novels, and news. The model is capable of handling the zero-width non-joiner character in Persian writing.
Architecture
ParsBERT utilizes the Transformer-based architecture of BERT, which is renowned for its effectiveness in natural language processing tasks. It is tailored to handle the Persian language by incorporating specific features like the zero-width non-joiner character.
Training
The model has been trained on a comprehensive set of Persian corpora that includes multiple types of text. This training encompassed a broad array of vocabulary, enabling ParsBERT to understand and generate Persian text with improved accuracy.
Guide: Running Locally
To run ParsBERT locally, follow these basic steps:
- Installation: Make sure you have Python 3.6 or higher installed. Set up a virtual environment and install the Hugging Face Transformers library.
pip install transformers
- Download the Model: You can download ParsBERT from Hugging Face's Model Hub.
- Run Inference: Use the model with the Transformers library to perform tasks such as fill-mask or text classification.
For optimal performance, consider using a cloud GPU service such as Google Colab, AWS, or Azure, which can significantly speed up processing times for large models like ParsBERT.
License
ParsBERT is released under the Apache 2.0 License. This permits use, distribution, and modification of the software, provided that the original license is included with any substantial portions of the software.