Smol L M2 1.7 B Instruct LLM Model

Introduction

SmolLM2 is a series of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are designed to perform a variety of tasks efficiently on-device. The 1.7B variant shows significant improvements in instruction following, knowledge, reasoning, and mathematics compared to its predecessor, SmolLM1-1.7B.

Architecture

The SmolLM2 models are built on a transformer decoder architecture. They were pretrained on 11 trillion tokens and use bfloat16 precision. The training involved 256 H100 GPUs utilizing the nanotron framework.

Training

SmolLM2-1.7B was trained using a diverse dataset, including FineWeb-Edu, DCLM, and The Stack, as well as newly curated mathematics and coding datasets. The model underwent supervised fine-tuning (SFT) with both public and proprietary datasets, followed by Direct Preference Optimization (DPO) using UltraFeedback.

Guide: Running Locally

Install the Transformers Library
```
pip install transformers
```

Load the Model and Tokenizer

from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
device = "cuda"  # Use "cpu" if GPU is unavailable
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

Generate Text

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

Use Cloud GPUs
For better performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

SmolLM2 is licensed under the Apache 2.0 License.

More Related APIs in Text Generation