Smol L M2 360 M Instruct
HuggingFaceTBIntroduction
SmolLM2 is a family of compact language models with sizes of 135M, 360M, and 1.7B parameters, designed for efficient on-device operation. The models excel in instruction following, knowledge, and reasoning tasks, offering improvements over their predecessor, SmolLM1.
Architecture
The SmolLM2 models utilize a Transformer decoder architecture, trained with 4 trillion tokens and precision set to bfloat16. Training employed a combination of diverse datasets, including FineWeb-Edu, DCLM, and The Stack, with additional supervised fine-tuning (SFT) and Direct Preference Optimization (DPO).
Training
Training was conducted using 64 H100 GPUs and the nanotron training framework. The instruct version of the model was developed with both public and curated datasets, enabling additional capabilities such as text rewriting, summarization, and function calling for larger models.
Guide: Running Locally
-
Install Transformers Library:
pip install transformers
-
Load the Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceTB/SmolLM2-360M-Instruct" device = "cuda" # Use "cpu" for CPU usage tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
-
Generate Text:
messages = [{"role": "user", "content": "What is the capital of France."}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
-
Use Cloud GPUs: For optimal performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
SmolLM2 is licensed under the Apache 2.0 License, which allows for use, distribution, and modification under specified conditions.