Yu Lan Mini G G U F
QuantFactoryIntroduction
YuLan-Mini is a lightweight language model developed by AI Box at Renmin University of China. With 2.4 billion parameters, it performs comparably to larger models despite being pre-trained on only 1.08 trillion tokens. The model excels in mathematics and code generation, and its development emphasizes data efficiency.
Architecture
The YuLan-Mini model employs a quantized architecture using llama.cpp. This design choice enhances its performance and efficiency, particularly in handling tasks involving extensive datasets and mathematical computations.
Training
YuLan-Mini's training involves several innovative strategies:
- A carefully crafted data pipeline integrating data cleaning and scheduling strategies.
- A systematic optimization method to address training instability.
- An annealing approach that focuses on targeted data selection and long-context training.
Guide: Running Locally
To run YuLan-Mini locally, follow these steps:
-
Install the required libraries:
pip install torch transformers
-
Use the following Python script for inference:
import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yulan-team/YuLan-Mini") model = AutoModelForCausalLM.from_pretrained("yulan-team/YuLan-Mini", torch_dtype=torch.bfloat16) input_text = "Renmin University of China is" inputs = tokenizer(input_text, return_tensors="pt") output = model.generate(inputs["input_ids"], max_new_tokens=100) print(tokenizer.decode(output[0], skip_special_tokens=True))
-
For enhanced performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
YuLan-Mini is released under the MIT License, allowing for flexibility in use. However, users should be aware of potential ethical concerns and avoid spreading harmful content generated by the model. The policies for using model weights, optimizer states, and training data will be updated in the future.