japanese gpt2 medium
rinnaIntroduction
The Japanese-GPT2-Medium model is a medium-sized language model designed for generating Japanese text. Developed by Rinna Co., Ltd., it utilizes the architecture of GPT-2 and is available on Hugging Face's platform.
Architecture
The model is a 24-layer transformer-based architecture with a hidden size of 1024. It is optimized for natural language processing tasks, specifically for the Japanese language.
Training
The Japanese-GPT2-Medium model was trained using datasets from Japanese CC-100 and Japanese Wikipedia. Training was conducted on 8 V100 GPUs over approximately 30 days. The model achieved a perplexity of around 18 on a selected validation set, indicating its proficiency in language modeling tasks.
Guide: Running Locally
To run the model locally, follow these steps:
-
Install Dependencies: Ensure you have the
transformers
library installed. You can install it using pip:pip install transformers
-
Load the Model: Use the following code snippet to load the model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-medium", use_fast=False) tokenizer.do_lower_case = True # Adjust for tokenizer bug model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
-
Run Inference: Use the loaded model to generate text based on the provided input.
-
Cloud GPUs: For efficient processing, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.
License
The model and its implementation are distributed under the MIT License, which allows for flexible use, modification, and distribution. For more details, see the MIT License.