alberti bert base multilingual cased
flax-communityIntroduction
ALBERTI is a set of two BERT-based multilingual models designed for poetry, specifically for verses and stanzas. These models have been further trained using the PULPO corpus with the Flax library. The project was part of the Flax/Jax Community Week, sponsored by Hugging Face and Google.
Architecture
The ALBERTI models are based on BERT architecture and are multilingual, designed to work with poetry in various languages. They are equipped to handle tasks such as masked language modeling.
Training
ALBERTI was trained on the PULPO corpus, a comprehensive multilingual collection of poetry containing over 95 million words. This corpus includes resources in languages such as Spanish, English, French, Italian, Czech, Portuguese, Arabic, Chinese, Finnish, German, Hungarian, and Russian. Training utilized the Flax library and TPU resources provided by Google.
Guide: Running Locally
- Clone the Repository: Start by cloning the model repository from Hugging Face.
- Install Dependencies: Ensure that you have
transformers
,flax
, and other required libraries installed. - Load the Model: Use the
transformers
library to load the ALBERTI model. - Inference: Perform tasks such as masked language modeling using provided example scripts.
For enhanced performance, consider using cloud GPUs, such as those provided by Google Cloud or AWS.
License
The ALBERTI model is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, allowing for sharing and adaptation with appropriate credit.