llama 3.1 Asian Bllossom 8 B Translator G G U F

QuantFactory

Introduction

The LLAMA-3.1-ASIAN-BLLOSSOM-8B-TRANSLATOR-GGUF is a multilingual translation model optimized for translating between various Southeast Asian languages. It is an 8 billion parameter model fine-tuned on the LLaMA 3.1 Instruct base model.

Architecture

The model utilizes the base architecture of meta-llama/Llama-3.1-8B-Instruct. It is designed to handle short text segments and facilitate communication between Korean, Vietnamese, Indonesian, Cambodian (Khmer), and Thai languages. The model's performance is optimized for short sentences and may struggle with longer or more complex text.

Training

The model was trained with an extensive dataset of 20 million examples, with 1 million examples for each translation direction. This training data focuses on providing robust translations for common expressions and basic conversations across the supported languages. Evaluation metrics include BLEU and ROUGE scores, indicating varying levels of translation quality depending on language pair and content complexity.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Install Dependencies: Ensure you have PyTorch and Transformers installed.

    pip install torch transformers
    
  2. Load the Model and Tokenizer:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model = AutoModelForCausalLM.from_pretrained(
        "MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator",
        torch_dtype="auto",
        device_map="auto",
    )
    
    tokenizer = AutoTokenizer.from_pretrained(
        "MLP-KTLim/llama-3.1-Asian-Bllossom-8B-Translator",
    )
    
  3. Prepare Input:

    input_text = "Your text here"
    input_ids = tokenizer.apply_chat_template(
        conversation=[
            {"role": "system", "content": "Translation prompt"},
            {"role": "user", "content": input_text},
        ],
        tokenize=True,
        return_tensors="pt",
        add_generation_prompt=True,
    )
    
  4. Generate Translation:

    output = model.generate(
        input_ids.to(model.device),
        max_new_tokens=128,
    )
    
    print(tokenizer.decode(output[0][len(input_ids[0]):], skip_special_tokens=True))
    
  5. Cloud GPUs: For improved performance, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

The model is licensed under the llama3.1 license. Ensure compliance with this license when using the model in your applications.

More Related APIs