llm jp 3 172b instruct3

llm-jp

Introduction

LLM-JP-3-172B-INSTRUCT3 is a large language model developed by the Research and Development Center for Large Language Models at the National Institute of Informatics. The model supports both English and Japanese languages and is designed for text generation tasks.

Architecture

The LLM-JP models are based on the Transformer architecture. The model variants, including LLM-JP-3-172B-INSTRUCT3, differ in their parameters and configurations, such as the number of layers, hidden size, and the number of attention heads. The tokenizer used is based on a Unigram byte-fallback model.

Training

Pre-Training

The models were pre-trained using datasets like Japanese Wikipedia, Common Crawl, and English Wikipedia, among others, with a total of 2.1 trillion seen tokens for LLM-JP-3-172B.

Post-Training

The model underwent supervised fine-tuning and Direct Preference Optimization. Various datasets, including ichikara-instruction and synthetic datasets, were used for fine-tuning to enhance the model's safety and helpfulness.

Guide: Running Locally

  1. Install Required Libraries:
    Ensure you have the following Python libraries:

    • torch>=2.3.0
    • transformers>=4.40.1
    • tokenizers>=0.19.1
    • accelerate>=0.29.3
    • flash-attn>=2.5.8
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("llm-jp/llm-jp-3-172b-instruct3")
    model = AutoModelForCausalLM.from_pretrained("llm-jp/llm-jp-3-172b-instruct3", device_map="auto", torch_dtype=torch.bfloat16)
    
  3. Prepare Input and Generate Output:

    chat = [
        {"role": "system", "content": "以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。"},
        {"role": "user", "content": "自然言語処理とは何か"},
    ]
    tokenized_input = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output = model.generate(
            tokenized_input,
            max_new_tokens=100,
            do_sample=True,
            top_p=0.95,
            temperature=0.7,
            repetition_penalty=1.05,
        )[0]
    print(tokenizer.decode(output))
    
  4. Suggest Cloud GPUs:
    To run the model efficiently, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The model is released under the "llm-jp-3-172b-instruct3-tou" license. For detailed licensing information, refer to the LICENSE file.

More Related APIs in Text Generation