Llama 3.1 Swallow 70 B Instruct v0.3

tokyotech-llm

Introduction

Llama 3.1 Swallow is a series of large language models designed to enhance Japanese language capabilities while retaining English proficiency. These models are continually pre-trained on Meta Llama 3.1 and are instruction-tuned for improved conversational abilities.

Architecture

The models utilize the Llama architecture and are developed using the Megatron-LM library. They are available in different variants, such as 8B and 70B. The tokenizer details can be found in the Llama 3.1 blog.

Training

The Llama 3.1 Swallow models are trained on a diverse dataset comprising Japanese web corpus, Wikipedia articles, and math and coding content. Instruction tuning is performed using datasets like lmsys-chat-1m-synth and gemma-magpie to improve multi-turn Japanese instruction processing.

Guide: Running Locally

  1. Installation
    Install the required packages:

    pip install vllm
    
  2. Loading the Model
    Utilize the transformers library to load the tokenizer and model:

    from transformers import AutoTokenizer
    from vllm import LLM, SamplingParams
    
    model_name = "tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    llm = LLM(model=model_name, tensor_parallel_size=1)
    
  3. Generating Text
    Define sampling parameters and generate text:

    sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>")
    prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
    output = llm.generate(prompt, sampling_params)
    print(output[0].outputs[0].text)
    
  4. Hardware Recommendation
    Consider using cloud GPUs like NVIDIA A100 for optimal performance.

License

The models are released under the META LLAMA 3.1 Community License and the Gemma Terms of Use.

More Related APIs in Text Generation