Yu Lan Mini
yulan-teamIntroduction
YuLan-Mini is a data-efficient language model developed by the AI Box team at Renmin University of China. It features 2.4 billion parameters and is designed to perform well in mathematics and code domains, using only 1.08 trillion tokens for pre-training. The model aims to offer comparable performance to larger models with significantly more data.
Architecture
YuLan-Mini incorporates a novel pre-training methodology that enhances training efficiency through:
- A sophisticated data pipeline for data cleaning and scheduling.
- Systematic optimization to reduce training instability.
- Targeted data selection and long context training for efficient annealing.
Training
YuLan-Mini was pre-trained on a diverse set of datasets, including those focused on mathematics and programming. It supports text-generation tasks and has demonstrated strong performance across various benchmarks such as HumanEval, GSM8K, and MATH-500.
Guide: Running Locally
To run YuLan-Mini locally, follow these steps:
-
Install Dependencies: Ensure you have Python installed with necessary libraries like
torch
andtransformers
. -
Load Model and Tokenizer:
import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yulan-team/YuLan-Mini") model = AutoModelForCausalLM.from_pretrained("yulan-team/YuLan-Mini", torch_dtype=torch.bfloat16)
-
Perform Inference:
input_text = "Renmin University of China is" inputs = tokenizer(input_text, return_tensors="pt") output = model.generate(inputs["input_ids"], max_new_tokens=100) print(tokenizer.decode(output[0], skip_special_tokens=True))
-
Serve Model:
- Using vLLM:
vllm serve yulan-team/YuLan-Mini --dtype bfloat16
- Using SGLang:
python -m sglang.launch_server --model-path yulan-team/YuLan-Mini --port 30000 --host 0.0.0.0
- Using vLLM:
For optimal performance, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure.
License
YuLan-Mini is released under the MIT License. Policies concerning the model weights and data usage will be updated in future releases. Although efforts have been made to ensure safety and ethical generation, users should be aware of potential biases and harmful content that could arise from model outputs.