rugpt3medium tathagata LLM Model

Introduction

RUGPT3MEDIUM-TATHAGATA is a Russian text generation model based on the GPT-2 architecture. It is designed to perform tasks related to text generation using a dataset compiled from summaries of Buddhist, Hindu, and Advaita texts. The model was fine-tuned and is intended for use with NVIDIA RTX 3080 GPUs.

Architecture

The model is built on the rugpt3medium_based_on_gpt2 architecture, which is a variant of GPT-2. It supports the PyTorch framework and uses Safetensors for safe and efficient storage of model weights. The model primarily focuses on causal language modeling and text generation tasks in Russian.

Training

RUGPT3MEDIUM-TATHAGATA was trained using a dataset consisting of key philosophical texts such as the Diamond Sutra, Lankavatara Sutra, quotes from Sri Nisargadatta Maharaj, and the Bhagavad Gita. This dataset is available for public access and use.

Guide: Running Locally

To run the RUGPT3MEDIUM-TATHAGATA model locally, follow these steps:

Setup Environment:
- Ensure you have PyTorch and the Transformers library installed.
- Use a CUDA-enabled GPU for optimal performance.

Model and Tokenizer Initialization:

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

DEVICE = torch.device("cuda:0")
model_name_or_path = "radm/rugpt3medium-tathagata"
tokenizer = GPT2Tokenizer.from_pretrained("sberbank-ai/rugpt3medium_based_on_gpt2")
model = GPT2LMHeadModel.from_pretrained(model_name_or_path).to(DEVICE)

Text Generation:

text = "В чем смысл жизни?\n"
input_ids = tokenizer.encode(text, return_tensors="pt").to(DEVICE)
model.eval()
with torch.no_grad():
    out = model.generate(input_ids, 
                         do_sample=True,
                         num_beams=4,
                         temperature=1.1,
                         top_p=0.9,
                         top_k=50,
                         max_length=250,
                         min_length=50,
                         early_stopping=True,
                         no_repeat_ngram_size=2)
generated_text = list(map(tokenizer.decode, out))[0]
print(generated_text)

Cloud GPUs:
- Consider using cloud-based services such as AWS, GCP, or Azure, which provide access to powerful GPUs like the NVIDIA RTX 3080.

License

RUGPT3MEDIUM-TATHAGATA is licensed under the Apache-2.0 License, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text Generation