Llama-3.2-Taiwan-3B-GGUF

Introduction

Llama-3.2-Taiwan-3B-GGUF is a quantized model derived from lianghsun/Llama-3.2-Taiwan-3B, optimized using llama.cpp. It is designed to generate text in both Traditional Chinese and English, with a focus on Taiwanese language and culture.

Architecture

The model is based on meta-llama/Llama-3.2-3B, a foundation model enhanced with continual pre-training on a substantial corpus of Traditional Chinese and multilingual data. The architecture supports text generation and is tailored for environments with limited hardware resources.

Training

Training Data

The model was trained using a diverse range of datasets, including:

Traditional Chinese datasets like lianghsun/tw-novel-1.1B, lianghsun/tw-finance-159M, and other specialized corpora.
Multilingual datasets such as intfloat/multilingual_cc_news.

Training Procedure

The training involved preprocessing steps like formatting text to handle mixed character types and truncating data exceeding a token limit of 4096. The training utilized a single-node distribution setup with 4 devices and was conducted using the AdamW optimizer with a cosine learning rate scheduler over 10 epochs.

Training Hyperparameters

Learning Rate: 5e-6
Batch Size: 8 (train), 4 (eval)
Gradient Accumulation Steps: 50
Total Train Batch Size: 1,600
Optimizer: AdamW (torch_fused)
Scheduler: Cosine with warmup

Guide: Running Locally

To run the model locally, use the following Docker command:

docker run --runtime nvidia --gpus all \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model lianghsun/Llama-3.2-Taiwan-3B

For different checkpoints, append --revision <tag_name> to the command.

Cloud GPUs

Consider using cloud GPU services for optimal performance, such as those offered by major providers like AWS, Google Cloud, or Azure.

License

The model is released under the llama3.2 license. For more details, refer to the license file.

More Related APIs in Text Generation

Llama 3.2 Taiwan 3 B G G U F