genji jp LLM Model — Open LLM List

Introduction

Genji-JP 6B is a Japanese language model finetuned on a storytelling dataset. It is based on EleutherAI's GPT-J 6B model and is designed for generating text in Japanese web novel contexts.

Architecture

The Genji-JP 6B model is structured with:

28 layers, each composed of a feedforward block and a self-attention block.
Model dimension: 4096
Feedforward dimension: 16384
16 attention heads with a dimension of 256 each.
Rotary position encodings (RoPE) applied to 64 dimensions per head.
Vocabulary size: 50,400, using the same Byte Pair Encoding (BPE) as GPT-2/GPT-3.

Training

The model was pretrained on The Pile, a large-scale dataset curated by EleutherAI. Following pretraining, it was finetuned on a Japanese storytelling dataset to enhance its performance in generating coherent and contextually appropriate Japanese narratives.

Guide: Running Locally

To use the Genji-JP 6B model locally:

Install the Hugging Face Transformers library.

Import the necessary classes:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

Load the tokenizer and the model:

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()

Prepare your input text and generate text:

text = '''Your input text here'''
tokens = tokenizer(text, return_tensors="pt").input_ids
generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, temperature=1, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
generated_text = tokenizer.decode(generated_tokens[0]).replace("�", "")
print("Generation:\n" + generated_text)

For optimal performance, use cloud GPUs like those from Google Cloud or AWS.

License

The Genji-JP 6B model is distributed under the Apache-2.0 license, allowing for extensive usage and modification with appropriate attribution.

More Related APIs in Text Generation