Llama 3.2 Taiwan 3 B G G U F
QuantFactoryLlama-3.2-Taiwan-3B-GGUF
Introduction
Llama-3.2-Taiwan-3B-GGUF is a quantized model derived from lianghsun/Llama-3.2-Taiwan-3B
, optimized using llama.cpp
. It is designed to generate text in both Traditional Chinese and English, with a focus on Taiwanese language and culture.
Architecture
The model is based on meta-llama/Llama-3.2-3B
, a foundation model enhanced with continual pre-training on a substantial corpus of Traditional Chinese and multilingual data. The architecture supports text generation and is tailored for environments with limited hardware resources.
Training
Training Data
The model was trained using a diverse range of datasets, including:
- Traditional Chinese datasets like
lianghsun/tw-novel-1.1B
,lianghsun/tw-finance-159M
, and other specialized corpora. - Multilingual datasets such as
intfloat/multilingual_cc_news
.
Training Procedure
The training involved preprocessing steps like formatting text to handle mixed character types and truncating data exceeding a token limit of 4096. The training utilized a single-node distribution setup with 4 devices and was conducted using the AdamW optimizer with a cosine learning rate scheduler over 10 epochs.
Training Hyperparameters
- Learning Rate: 5e-6
- Batch Size: 8 (train), 4 (eval)
- Gradient Accumulation Steps: 50
- Total Train Batch Size: 1,600
- Optimizer: AdamW (torch_fused)
- Scheduler: Cosine with warmup
Guide: Running Locally
To run the model locally, use the following Docker command:
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model lianghsun/Llama-3.2-Taiwan-3B
For different checkpoints, append --revision <tag_name>
to the command.
Cloud GPUs
Consider using cloud GPU services for optimal performance, such as those offered by major providers like AWS, Google Cloud, or Azure.
License
The model is released under the llama3.2
license. For more details, refer to the license file.