Smol L M2 1.7 B Instruct
HuggingFaceTBIntroduction
SmolLM2 is a series of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are designed to perform a variety of tasks efficiently on-device. The 1.7B variant shows significant improvements in instruction following, knowledge, reasoning, and mathematics compared to its predecessor, SmolLM1-1.7B.
Architecture
The SmolLM2 models are built on a transformer decoder architecture. They were pretrained on 11 trillion tokens and use bfloat16 precision. The training involved 256 H100 GPUs utilizing the nanotron framework.
Training
SmolLM2-1.7B was trained using a diverse dataset, including FineWeb-Edu, DCLM, and The Stack, as well as newly curated mathematics and coding datasets. The model underwent supervised fine-tuning (SFT) with both public and proprietary datasets, followed by Direct Preference Optimization (DPO) using UltraFeedback.
Guide: Running Locally
-
Install the Transformers Library
pip install transformers
-
Load the Model and Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct" device = "cuda" # Use "cpu" if GPU is unavailable tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
-
Generate Text
messages = [{"role": "user", "content": "What is the capital of France."}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
-
Use Cloud GPUs
For better performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
SmolLM2 is licensed under the Apache 2.0 License.