papu Ga P T2
flax-communityIntroduction
PapuGaPT2 is a Polish language model based on GPT-2, aimed at improving text generation capabilities for Polish NLP research. The model is trained using a causal language modeling objective, predicting the next word in a sequence.
Architecture
PapuGaPT2 follows the standard GPT-2 architecture, leveraging a transformer-based framework for generating text. It uses a byte-level version of Byte Pair Encoding (BPE) for tokenization with a vocabulary size of 50,257 tokens.
Training
The model was trained using the Polish subset of the multilingual Oscar corpus in a self-supervised manner. Training was conducted on a single TPUv3 VM, using a causal language modeling script for Flax. Training was split into three parts with different learning rates and batch sizes, resulting in an evaluation loss of 3.082 and a perplexity of 21.79.
Guide: Running Locally
- Installation: Ensure you have the
transformers
library installed. Usepip install transformers
if necessary. - Load Model:
from transformers import pipeline, set_seed generator = pipeline('text-generation', model='flax-community/papuGaPT2') set_seed(42)
- Generate Text:
generator('Największym polskim poetą był')
- Cloud GPUs: For better performance, consider using cloud GPUs from platforms like AWS, GCP, or Azure.
License
The model is available for research purposes, and due to potential biases and limitations, it is not recommended for production use without proper mitigation strategies. Refer to the Hugging Face model card for detailed license information.