L La M A 3 M E Ra Li O N 8 B Instruct

MERaLiON

Introduction

LLaMA-3-MERaLiON-8B-Instruct is a large language model (LLM) designed for multilingual understanding and instruction-following tasks. Developed by I2R, A*STAR, it builds upon the Llama-3-8B architecture, focusing on English, Chinese, and Indonesian languages. The model is distributed under the MERaLiON Public License.

Architecture

The model is a text decoder with a context length of 8192 tokens, based on the Llama-3.1-8B architecture. It employs extended pretraining and SEA multilingual corpus mixing strategies to enhance language understanding across diverse contexts.

Training

The model was pretrained on over 120 billion tokens, incorporating domain-diversified corpus selection and optimized training techniques. These include hyperparameter tuning and replay strategies to maintain stability and quality. The model's instruction-following capabilities were enhanced by merging weights from other models rather than using additional supervised data.

Guide: Running Locally

  1. Prerequisites:

    • Install the 🤗 Transformers library.
    • Ensure PyTorch is installed with support for GPU acceleration.
  2. Code Example:

    import transformers
    import torch
    
    model_id = "MERaLiON/MERaLiON-LLaMA-3-8B-Instruct"
    
    pipeline = transformers.pipeline(
        "text-generation",
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},
        device_map="auto",
    )
    messages = [
        {"role": "user", "content": "What is the sentiment of the following sentence?\nSentence: This book is incredibly dull.\nAnswer:"},
    ]
    
    outputs = pipeline(
        messages,
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1])
    
  3. Suggested Hardware:

    • Utilize cloud GPUs such as NVIDIA H100 for efficient processing.

License

The model is available under the MERaLiON Public License. For further details, refer to the license document. Additionally, Meta Llama 3 is covered by the Meta Llama 3 Community License, with all rights reserved by Meta Platforms, Inc.

More Related APIs