Smol L M 135 M
HuggingFaceTBIntroduction
SmolLM-135M is a state-of-the-art small language model, part of the SmolLM series, which includes models with 135M, 360M, and 1.7B parameters. These models are built using the Cosmo-Corpus, a high-quality training dataset comprising Cosmopedia v2, Python-Edu, and FineWeb-Edu. SmolLM models excel in benchmarks that assess common sense reasoning and world knowledge.
Architecture
SmolLM models are trained using the Nanotron framework with 600,000 pretraining steps over 600 billion tokens. The models employ bfloat16 precision and use the HuggingFaceTB/cosmo2-tokenizer. Training was conducted on 64 H100 GPUs.
Training
The training dataset, Cosmo-Corpus, is meticulously curated and includes diverse sources such as synthetic textbooks, educational Python samples, and educational web content. This diverse corpus aims to enhance the model's understanding and generation capabilities.
Guide: Running Locally
To run SmolLM-135M locally, follow these steps:
-
Installation:
Ensure you have Python installed. Then, install the necessary libraries using pip:pip install transformers accelerate bitsandbytes
-
Loading the Model:
You can load and run the model on either CPU or GPU. For GPU usage, ensure you have CUDA installed. Use the following code snippet:from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceTB/SmolLM-135M" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to("cuda") inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to("cuda") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0]))
-
Using Cloud GPUs:
For enhanced performance, consider using cloud services like AWS, Google Cloud, or Azure, where you can access high-performance GPUs.
License
SmolLM-135M is licensed under the Apache 2.0 License. This allows for both personal and commercial use, provided that proper attribution is given and any modifications are documented.