japanese gpt2 small
rinnaIntroduction
The Japanese GPT-2 Small is a compact language model designed for generating Japanese text. Developed by Rinna Co., Ltd., this model is based on the GPT-2 architecture and optimized for the Japanese language.
Architecture
The model features a 12-layer transformer architecture with a hidden size of 768. It employs a sentencepiece-based tokenizer for handling Japanese text, with the vocabulary trained specifically on Japanese Wikipedia data.
Training
The model was trained using the CC-100 and Japanese Wikipedia datasets. Training was conducted on 8 V100 GPUs over approximately 15 days, achieving a perplexity of around 21 on a validation set derived from CC-100.
Guide: Running Locally
To use the Japanese GPT-2 Small model locally, follow these steps:
-
Install the Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-small", use_fast=False) tokenizer.do_lower_case = True # Set due to a configuration issue model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-small")
-
Run Inference: Prepare your input text and use the loaded model and tokenizer to generate text.
For better performance, especially during training or extensive inference, it is recommended to use cloud GPU services such as AWS EC2, Google Cloud Platform, or Azure.
License
The Japanese GPT-2 Small model is released under the MIT License, allowing for flexibility in usage and distribution.