Llama 3.2 Taiwan 3 B
lianghsunIntroduction
The LLAMA-3.2-TAIWAN-3B model is designed for text generation tasks, with a focus on Traditional Chinese and multilingual capabilities. Developed by Lianghsun, this model is trained using a large corpus of Traditional Chinese texts and various multilingual datasets. It serves as a small language model, optimized for environments with limited GPU resources.
Architecture
LLAMA-3.2-TAIWAN-3B is based on the meta-llama/Llama-3.2-3B foundation model. It incorporates diverse datasets to ensure accurate representation of Taiwan's linguistic styles and usage patterns. The small language model structure reduces hardware requirements, making it feasible for instruction fine-tuning and deployment in resource-constrained settings.
Training
The model underwent continual pretraining using a variety of Traditional Chinese and multilingual datasets, such as:
- lianghsun/tw-novel-1.1B
- lianghsun/tw-finance-159M
- lianghsun/tw-legal-news-24M
Training was performed with the following hyperparameters:
- Learning rate: 5e-6
- Train batch size: 8
- Total training duration: 8 days
Preprocessing involved formatting text to handle mixed full-width and half-width characters and truncating text exceeding the model's cutoff length of 4096 tokens.
Guide: Running Locally
To run the model locally, you can use the vLLM Docker image:
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model lianghsun/Llama-3.2-Taiwan-3B
For different checkpoint versions, add the --revision <tag_name>
option. Utilizing cloud GPUs, such as those offered by prominent providers, is recommended for optimal performance.
License
The model is licensed under the llama3.2 license. For further details, refer to the license documentation.