Fugaku L L M 13 B
Fugaku-LLMIntroduction
Fugaku-LLM is a large-scale language model developed using the Fugaku supercomputer. It is designed to be highly transparent and safe, primarily using Japanese data, and excels in Japanese language tasks.
Architecture
- Model Type: GPT-2
- Languages: Japanese, English
- Library: DeepSpeedFugaku
- Tokenizer: llm-jp-tokenizer (code10k_en20k_ja30k of v2.2)
Training
The model is pre-trained from scratch with a focus on Japanese data. Instruction tuning was performed using datasets such as oasst1, databricks-dolly-15k, and gsm8k. The model was evaluated using the Japanese MT benchmark.
Guide: Running Locally
Steps
-
Install Dependencies:
torch
transformers
-
Load Model and Tokenizer:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "Fugaku-LLM/Fugaku-LLM-13B" tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto") model.eval()
-
Generate Text:
prompt = "スーパーコンピュータ「富岳」という名称は" input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt") tokens = model.generate( input_ids.to(device=model.device), max_new_tokens=128, do_sample=True, temperature=0.1, top_p=1.0, repetition_penalty=1.0, top_k=0 ) out = tokenizer.decode(tokens[0], skip_special_tokens=True) print(out)
Cloud GPUs
For optimal performance, consider using cloud GPUs such as AWS, GCP, or Azure.
License
The Fugaku-LLM is released under the Fugaku-LLM Terms of Use, which can be accessed through the LICENSE and LICENSE_ja files. Users must adhere to these terms when using, modifying, or redistributing the model.