japanese gpt2 medium

rinna

Introduction

The Japanese-GPT2-Medium model is a medium-sized language model designed for generating Japanese text. Developed by Rinna Co., Ltd., it utilizes the architecture of GPT-2 and is available on Hugging Face's platform.

Architecture

The model is a 24-layer transformer-based architecture with a hidden size of 1024. It is optimized for natural language processing tasks, specifically for the Japanese language.

Training

The Japanese-GPT2-Medium model was trained using datasets from Japanese CC-100 and Japanese Wikipedia. Training was conducted on 8 V100 GPUs over approximately 30 days. The model achieved a perplexity of around 18 on a selected validation set, indicating its proficiency in language modeling tasks.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have the transformers library installed. You can install it using pip:

    pip install transformers
    
  2. Load the Model: Use the following code snippet to load the model and tokenizer:

    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt2-medium", use_fast=False)
    tokenizer.do_lower_case = True  # Adjust for tokenizer bug
    
    model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt2-medium")
    
  3. Run Inference: Use the loaded model to generate text based on the provided input.

  4. Cloud GPUs: For efficient processing, consider using cloud GPU services like AWS EC2, Google Cloud, or Azure.

License

The model and its implementation are distributed under the MIT License, which allows for flexible use, modification, and distribution. For more details, see the MIT License.

More Related APIs in Text Generation