Llama 3.1 Swallow 70 B Instruct v0.3
tokyotech-llmIntroduction
Llama 3.1 Swallow is a series of large language models designed to enhance Japanese language capabilities while retaining English proficiency. These models are continually pre-trained on Meta Llama 3.1 and are instruction-tuned for improved conversational abilities.
Architecture
The models utilize the Llama architecture and are developed using the Megatron-LM library. They are available in different variants, such as 8B and 70B. The tokenizer details can be found in the Llama 3.1 blog.
Training
The Llama 3.1 Swallow models are trained on a diverse dataset comprising Japanese web corpus, Wikipedia articles, and math and coding content. Instruction tuning is performed using datasets like lmsys-chat-1m-synth and gemma-magpie to improve multi-turn Japanese instruction processing.
Guide: Running Locally
-
Installation
Install the required packages:pip install vllm
-
Loading the Model
Utilize thetransformers
library to load the tokenizer and model:from transformers import AutoTokenizer from vllm import LLM, SamplingParams model_name = "tokyotech-llm/Llama-3.1-Swallow-70B-Instruct-v0.3" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = LLM(model=model_name, tensor_parallel_size=1)
-
Generating Text
Define sampling parameters and generate text:sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>") prompt = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True) output = llm.generate(prompt, sampling_params) print(output[0].outputs[0].text)
-
Hardware Recommendation
Consider using cloud GPUs like NVIDIA A100 for optimal performance.
License
The models are released under the META LLAMA 3.1 Community License and the Gemma Terms of Use.