Fugaku L L M 13 B

Fugaku-LLM

Introduction

Fugaku-LLM is a large-scale language model developed using the Fugaku supercomputer. It is designed to be highly transparent and safe, primarily using Japanese data, and excels in Japanese language tasks.

Architecture

  • Model Type: GPT-2
  • Languages: Japanese, English
  • Library: DeepSpeedFugaku
  • Tokenizer: llm-jp-tokenizer (code10k_en20k_ja30k of v2.2)

Training

The model is pre-trained from scratch with a focus on Japanese data. Instruction tuning was performed using datasets such as oasst1, databricks-dolly-15k, and gsm8k. The model was evaluated using the Japanese MT benchmark.

Guide: Running Locally

Steps

  1. Install Dependencies:

    • torch
    • transformers
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_path = "Fugaku-LLM/Fugaku-LLM-13B"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
    model.eval()
    
  3. Generate Text:

    prompt = "スーパーコンピュータ「富岳」という名称は"
    input_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
    tokens = model.generate(
        input_ids.to(device=model.device),
        max_new_tokens=128,
        do_sample=True,
        temperature=0.1,
        top_p=1.0,
        repetition_penalty=1.0,
        top_k=0
    )
    out = tokenizer.decode(tokens[0], skip_special_tokens=True)
    print(out)
    

Cloud GPUs

For optimal performance, consider using cloud GPUs such as AWS, GCP, or Azure.

License

The Fugaku-LLM is released under the Fugaku-LLM Terms of Use, which can be accessed through the LICENSE and LICENSE_ja files. Users must adhere to these terms when using, modifying, or redistributing the model.

More Related APIs in Text Generation