gpt2 turkish cased
redrussianarmyIntroduction
The GPT-2 Turkish model is a language model trained specifically for Turkish text generation. It serves as a base model for fine-tuning on additional Turkish text datasets.
Architecture
The model was trained using the byte-level Byte Pair Encoding (BPE) tokenizer, which consists of a 52K vocabulary derived from the Oscar corpus. It is compatible with both PyTorch and TensorFlow frameworks.
Training
The training was conducted on a Turkish corpus sourced from the Oscar corpus. The model was trained for five epochs using two NVIDIA 2080TI GPUs. Training logs are available on TensorBoard for further insights into the model's training process.
Guide: Running Locally
To use the Turkish GPT-2 model locally, follow these steps:
-
Install Libraries: Ensure you have the
transformers
library installed.pip install transformers
-
Model Loading: Use the following code snippet to load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelWithLMHead tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased") model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")
-
Text Generation: Use the Transformers pipeline for generating text:
from transformers import pipeline pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased", tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length': 800}) text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"] print(text)
-
Cloning the Repository: To clone the model repository, use:
git lfs install git clone https://huggingface.co/redrussianarmy/gpt2-turkish-cased
For enhanced performance, consider using cloud GPUs from providers such as AWS, Google Cloud, or Azure.
License
The model and associated files are available for use under the terms specified in the repository. Please refer to the repository for more details regarding usage rights and limitations.