japanese gpt2 small LLM Model

Introduction

The Japanese GPT-2 Small is a compact language model designed for generating Japanese text. Developed by Rinna Co., Ltd., this model is based on the GPT-2 architecture and optimized for the Japanese language.

Architecture

The model features a 12-layer transformer architecture with a hidden size of 768. It employs a sentencepiece-based tokenizer for handling Japanese text, with the vocabulary trained specifically on Japanese Wikipedia data.

Training

The model was trained using the CC-100 and Japanese Wikipedia datasets. Training was conducted on 8 V100 GPUs over approximately 15 days, achieving a perplexity of around 21 on a validation set derived from CC-100.

Guide: Running Locally

To use the Japanese GPT-2 Small model locally, follow these steps:

Install the Transformers Library:
```
pip install transformers
```

Load the Model and Tokenizer:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-small", use_fast=False)
tokenizer.do_lower_case = True  # Set due to a configuration issue

model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-small")

Run Inference: Prepare your input text and use the loaded model and tokenizer to generate text.

For better performance, especially during training or extensive inference, it is recommended to use cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.

License

The Japanese GPT-2 Small model is released under the MIT License, allowing for flexibility in usage and distribution.

More Related APIs in Text Generation