Smol L M 135 M Instruct
HuggingFaceTBIntroduction
SmolLM is a series of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are trained on the SmolLM-Corpus, which is a curated collection of high-quality educational and synthetic data. SmolLM-Instruct models are fine-tuned on publicly available datasets to optimize performance for instructional tasks.
Architecture
SmolLM models are designed for efficiency and performance across different sizes. They utilize a fine-tuning process that involves datasets like WebInstructSub and StarCoder2-Self-OSS-Instruct. Version 0.2 of the models includes adjustments to fine-tuning datasets to improve response quality and topic adherence.
Training
The training process for SmolLM models involves using the alignment-handbook, with parameters like a learning rate of 1e-3, a cosine schedule, and a warmup ratio of 0.1. The models are trained for one epoch with a global batch size of 262k tokens. The training datasets include Magpie-Pro-300K-Filtered, OpenHermes-2.5, and others, aimed at enhancing conversational abilities and topic adherence.
Guide: Running Locally
To run SmolLM models locally:
-
Install the Transformers library:
pip install transformers
-
Load the model:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct" device = "cuda" # Use "cpu" if GPU is unavailable tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
-
Generate Text:
messages = [{"role": "user", "content": "What is the capital of France."}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
-
Optional: Use TRL CLI for terminal chat:
pip install trl trl chat --model_name_or_path HuggingFaceTB/SmolLM-135M-Instruct --device cpu
For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
SmolLM models are released under the Apache 2.0 license, which allows for both commercial and non-commercial use.