gpt2 turkish cased LLM Model

Introduction

The GPT-2 Turkish model is a language model trained specifically for Turkish text generation. It serves as a base model for fine-tuning on additional Turkish text datasets.

Architecture

The model was trained using the byte-level Byte Pair Encoding (BPE) tokenizer, which consists of a 52K vocabulary derived from the Oscar corpus. It is compatible with both PyTorch and TensorFlow frameworks.

Training

The training was conducted on a Turkish corpus sourced from the Oscar corpus. The model was trained for five epochs using two NVIDIA 2080TI GPUs. Training logs are available on TensorBoard for further insights into the model's training process.

Guide: Running Locally

To use the Turkish GPT-2 model locally, follow these steps:

Install Libraries: Ensure you have the transformers library installed.
```
pip install transformers
```

Model Loading: Use the following code snippet to load the model and tokenizer:

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")
model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")

Text Generation: Use the Transformers pipeline for generating text:

from transformers import pipeline

pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased",
                tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length': 800})   
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
print(text)

Cloning the Repository: To clone the model repository, use:

git lfs install
git clone https://huggingface.co/redrussianarmy/gpt2-turkish-cased

For enhanced performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.

License

The model and associated files are available for use under the terms specified in the repository. Please refer to the repository for more details regarding usage rights and limitations.

More Related APIs in Text Generation