alberti bert base multilingual cased

flax-community

Introduction

ALBERTI is a set of two BERT-based multilingual models designed for poetry, specifically for verses and stanzas. These models have been further trained using the PULPO corpus with the Flax library. The project was part of the Flax/Jax Community Week, sponsored by Hugging Face and Google.

Architecture

The ALBERTI models are based on BERT architecture and are multilingual, designed to work with poetry in various languages. They are equipped to handle tasks such as masked language modeling.

Training

ALBERTI was trained on the PULPO corpus, a comprehensive multilingual collection of poetry containing over 95 million words. This corpus includes resources in languages such as Spanish, English, French, Italian, Czech, Portuguese, Arabic, Chinese, Finnish, German, Hungarian, and Russian. Training utilized the Flax library and TPU resources provided by Google.

Guide: Running Locally

  1. Clone the Repository: Start by cloning the model repository from Hugging Face.
  2. Install Dependencies: Ensure that you have transformers, flax, and other required libraries installed.
  3. Load the Model: Use the transformers library to load the ALBERTI model.
  4. Inference: Perform tasks such as masked language modeling using provided example scripts.

For enhanced performance, consider using cloud GPUs, such as those provided by Google Cloud or AWS.

License

The ALBERTI model is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license, allowing for sharing and adaptation with appropriate credit.

More Related APIs in Fill Mask