Introduction

PapuGaPT2 is a Polish language model based on GPT-2, aimed at improving text generation capabilities for Polish NLP research. The model is trained using a causal language modeling objective, predicting the next word in a sequence.

Architecture

PapuGaPT2 follows the standard GPT-2 architecture, leveraging a transformer-based framework for generating text. It uses a byte-level version of Byte Pair Encoding (BPE) for tokenization with a vocabulary size of 50,257 tokens.

Training

The model was trained using the Polish subset of the multilingual Oscar corpus in a self-supervised manner. Training was conducted on a single TPUv3 VM, using a causal language modeling script for Flax. Training was split into three parts with different learning rates and batch sizes, resulting in an evaluation loss of 3.082 and a perplexity of 21.79.

Guide: Running Locally

  1. Installation: Ensure you have the transformers library installed. Use pip install transformers if necessary.
  2. Load Model:
    from transformers import pipeline, set_seed
    generator = pipeline('text-generation', model='flax-community/papuGaPT2')
    set_seed(42)
    
  3. Generate Text:
    generator('Największym polskim poetą był')
    
  4. Cloud GPUs: For better performance, consider using cloud GPUs from platforms like AWS, GCP, or Azure.

License

The model is available for research purposes, and due to potential biases and limitations, it is not recommended for production use without proper mitigation strategies. Refer to the Hugging Face model card for detailed license information.

More Related APIs in Text Generation