Introduction

Genji-JP 6B is a Japanese language model finetuned on a storytelling dataset. It is based on EleutherAI's GPT-J 6B model and is designed for generating text in Japanese web novel contexts.

Architecture

The Genji-JP 6B model is structured with:

  • 28 layers, each composed of a feedforward block and a self-attention block.
  • Model dimension: 4096
  • Feedforward dimension: 16384
  • 16 attention heads with a dimension of 256 each.
  • Rotary position encodings (RoPE) applied to 64 dimensions per head.
  • Vocabulary size: 50,400, using the same Byte Pair Encoding (BPE) as GPT-2/GPT-3.

Training

The model was pretrained on The Pile, a large-scale dataset curated by EleutherAI. Following pretraining, it was finetuned on a Japanese storytelling dataset to enhance its performance in generating coherent and contextually appropriate Japanese narratives.

Guide: Running Locally

To use the Genji-JP 6B model locally:

  1. Install the Hugging Face Transformers library.
  2. Import the necessary classes:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
  3. Load the tokenizer and the model:
    tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
    model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()
    
  4. Prepare your input text and generate text:
    text = '''Your input text here'''
    tokens = tokenizer(text, return_tensors="pt").input_ids
    generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, temperature=1, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
    generated_text = tokenizer.decode(generated_tokens[0]).replace("�", "")
    print("Generation:\n" + generated_text)
    
  5. For optimal performance, use cloud GPUs like those from Google Cloud or AWS.

License

The Genji-JP 6B model is distributed under the Apache-2.0 license, allowing for extensive usage and modification with appropriate attribution.

More Related APIs in Text Generation