roberta kaz large

nur-dev

Introduction

roberta-kaz-large is a RoBERTa-based language model specifically developed for the Kazakh language. It uses the RobertaForMaskedLM architecture and has been trained on the "kz-transformers/multidomain-kazakh-dataset" to ensure robust generalization across various domains.

Architecture

The model is based on the RoBERTa architecture, designed for masked language modeling tasks. It can be utilized with the Hugging Face Transformers library, allowing for flexible integration into different applications.

Training

The training process involved using two NVIDIA A100 GPUs over 5.3 million examples from the specified dataset. Training was conducted for 10 epochs, incorporating gradient accumulation to handle large data batches effectively. The learning rate was gradually increased to optimize stability, resulting in 208,100 training steps focused on enhancing the model's Kazakh language proficiency.

Guide: Running Locally

To use roberta-kaz-large locally, follow these steps:

  1. Install the Hugging Face Transformers library:

    pip install transformers
    
  2. Load the model and tokenizer:

    from transformers import RobertaTokenizerFast, RobertaForMaskedLM
    
    tokenizer = RobertaTokenizerFast.from_pretrained('nur-dev/roberta-kaz-large')
    model = RobertaForMaskedLM.from_pretrained('nur-dev/roberta-kaz-large')
    
  3. Alternatively, use a pipeline for masked language modeling (MLM):

    from transformers import pipeline
    pipe = pipeline('fill-mask', model='nur-dev/roberta-kaz-large')
    

For optimal performance, especially during training or large-scale inference, consider using cloud GPU services such as AWS, GCP, or Azure.

License

The model is licensed under the Academic Free License 3.0 (AFL-3.0).

More Related APIs in Fill Mask