genji jp
NovelAIIntroduction
Genji-JP 6B is a Japanese language model finetuned on a storytelling dataset. It is based on EleutherAI's GPT-J 6B model and is designed for generating text in Japanese web novel contexts.
Architecture
The Genji-JP 6B model is structured with:
- 28 layers, each composed of a feedforward block and a self-attention block.
- Model dimension: 4096
- Feedforward dimension: 16384
- 16 attention heads with a dimension of 256 each.
- Rotary position encodings (RoPE) applied to 64 dimensions per head.
- Vocabulary size: 50,400, using the same Byte Pair Encoding (BPE) as GPT-2/GPT-3.
Training
The model was pretrained on The Pile, a large-scale dataset curated by EleutherAI. Following pretraining, it was finetuned on a Japanese storytelling dataset to enhance its performance in generating coherent and contextually appropriate Japanese narratives.
Guide: Running Locally
To use the Genji-JP 6B model locally:
- Install the Hugging Face Transformers library.
- Import the necessary classes:
from transformers import AutoTokenizer, AutoModelForCausalLM import torch
- Load the tokenizer and the model:
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B") model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()
- Prepare your input text and generate text:
text = '''Your input text here''' tokens = tokenizer(text, return_tensors="pt").input_ids generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, temperature=1, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id) generated_text = tokenizer.decode(generated_tokens[0]).replace("�", "") print("Generation:\n" + generated_text)
- For optimal performance, use cloud GPUs like those from Google Cloud or AWS.
License
The Genji-JP 6B model is distributed under the Apache-2.0 license, allowing for extensive usage and modification with appropriate attribution.