Smol L M2 135 M Instruct
HuggingFaceTBIntroduction
SmolLM2 is a family of compact language models with sizes ranging from 135M to 1.7B parameters. These models are designed to handle a wide array of tasks while being efficient enough for on-device usage. The SmolLM2 series shows improvements in instruction following, knowledge, and reasoning capabilities compared to its predecessor, SmolLM1.
Architecture
The SmolLM2 models employ a Transformer decoder architecture. The models were pretrained on 2 trillion tokens and use bfloat16 precision. The instruct variant is developed through supervised fine-tuning and Direct Preference Optimization.
Training
- Datasets: The models were trained using a diverse combination of datasets, including FineWeb-Edu, DCLM, The Stack, and newly curated datasets.
- Fine-Tuning: Supervised fine-tuning was applied using both public and proprietary datasets, supplemented by Direct Preference Optimization using UltraFeedback.
- Hardware: Training was conducted on 64 H100 GPUs.
- Software: The training framework used is Nanotron.
Guide: Running Locally
-
Installation: Install the necessary packages via pip:
pip install transformers
-
Model Loading:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceTB/SmolLM2-135M-Instruct" device = "cuda" # Use "cpu" if GPU is unavailable tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
-
Inference: Prepare inputs and generate outputs.
messages = [{"role": "user", "content": "What is gravity?"}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
-
Using TRL CLI:
pip install trl trl chat --model_name_or_path HuggingFaceTB/SmolLM2-135M-Instruct --device cpu
For cloud-based execution, consider using cloud GPUs such as those offered by AWS, Google Cloud, or Azure to handle larger model sizes efficiently.
License
The SmolLM2 models are licensed under the Apache 2.0 License.