Llama 3.1 Swallow 8 B Instruct v0.3

tokyotech-llm

Introduction

Llama 3.1 Swallow is a series of large language models, including 8B and 70B variants, designed to enhance Japanese language capabilities alongside English. These models are built by continual pre-training on Meta Llama 3.1 models and further tuned using a large corpus of Japanese and English text.

Architecture

Llama 3.1 Swallow models are based on the Llama architecture and are developed using the Megatron-LM library. They are designed to support both Japanese and English languages, improving upon the original Llama 3.1 models by increasing the proficiency in Japanese through extensive pre-training on diverse datasets.

Training

The models undergo continual pre-training on approximately 200 billion tokens sourced from Japanese web corpus, Wikipedia, and other content. Instruction-tuned models are created through supervised fine-tuning using synthetic data specifically tailored for Japanese. Datasets include lmsys-chat-1m-synth and filtered-magpie-ultra-ja, among others.

Guide: Running Locally

  1. Installation: Install the necessary packages with:
    pip install vllm
    
  2. Setup: Use the transformers library to load the tokenizer and model:
    from transformers import AutoTokenizer
    from vllm import LLM, SamplingParams
    
    model_name = "tokyotech-llm/Llama-3.1-Swallow-8B-Instruct-v0.3"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    llm = LLM(model=model_name, tensor_parallel_size=1)
    
  3. Inference: Define and run an inference task:
    sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>")
    message = [
        {"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"},
        {"role": "user", "content": "東京の紅葉した公園で、東京タワーと高層ビルを背景に、空を舞うツバメと草地に佇むラマが出会う温かな物語を書いてください。"},
    ]
    prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
    output = llm.generate(prompt, sampling_params)
    print(output[0].outputs[0].text)
    
  4. Cloud GPUs: For scaling and performance, consider using cloud GPU services like AWS, Google Cloud, or Azure.

License

The Llama 3.1 Swallow models are released under the META LLAMA 3.1 COMMUNITY LICENSE and the Gemma Terms of Use. For more details, refer to the specific license documents provided by the respective organizations.

More Related APIs in Text Generation