Llama 3.2 Taiwan 3 B Instruct G G U F

QuantFactory

Introduction

The Llama-3.2-Taiwan-3B-Instruct-GGUF is a quantized version of the original Llama-3.2-Taiwan-3B-Instruct model, designed for text generation tasks. It has been fine-tuned with a focus on Traditional Chinese and multiple languages, incorporating Taiwan-specific knowledge.

Architecture

The model utilizes the LlamaForCausalLM architecture. It supports multiple languages including Traditional Chinese, English, Italian, German, French, Japanese, and Korean. The model is fine-tuned from the base model lianghsun/Llama-3.2-Taiwan-3B.

Training

Training involved using a mixture of Traditional Chinese and multilingual datasets for instruction fine-tuning and Direct Preference Optimization (DPO). Key datasets include tw-legal-nlp, tw-judgment-qa, and more. Training hyperparameters include a learning rate of 5e-05, batch size of 105, and 4 GPUs in a multi-GPU setup. The training process involved 5 epochs, achieving a train loss of 0.8533.

Guide: Running Locally

Basic Steps

  1. Set Up Environment: Ensure you have Docker and NVIDIA drivers installed.
  2. Run Docker Image:
    docker run --runtime nvidia --gpus all \
        -v ~/.cache/huggingface:/root/.cache/huggingface \
        --env "HUGGING_FACE_HUB_TOKEN=<your_token>" \
        -p 8000:8000 \
        --ipc=host \
        vllm/vllm-openai:latest \
        --model lianghsun/Llama-3.2-Taiwan-3B-Instruct
    
  3. Optional Versioning: To use a specific version, append --revision <tag_name> to the Docker command.

Cloud GPUs

Consider using cloud providers like AWS, Google Cloud, or Azure to access powerful GPUs like NVIDIA H100 NVL for efficient model execution.

License

The model is released under the llama3.2 license. For detailed licensing information, refer to the original license file provided with the model.

More Related APIs in Text Generation