Smol L M2 1.7 B Instruct

HuggingFaceTB

Introduction

SmolLM2 is a series of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are designed to perform a variety of tasks efficiently on-device. The 1.7B variant shows significant improvements in instruction following, knowledge, reasoning, and mathematics compared to its predecessor, SmolLM1-1.7B.

Architecture

The SmolLM2 models are built on a transformer decoder architecture. They were pretrained on 11 trillion tokens and use bfloat16 precision. The training involved 256 H100 GPUs utilizing the nanotron framework.

Training

SmolLM2-1.7B was trained using a diverse dataset, including FineWeb-Edu, DCLM, and The Stack, as well as newly curated mathematics and coding datasets. The model underwent supervised fine-tuning (SFT) with both public and proprietary datasets, followed by Direct Preference Optimization (DPO) using UltraFeedback.

Guide: Running Locally

  1. Install the Transformers Library

    pip install transformers
    
  2. Load the Model and Tokenizer

    from transformers import AutoModelForCausalLM, AutoTokenizer
    checkpoint = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
    device = "cuda"  # Use "cpu" if GPU is unavailable
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
    
  3. Generate Text

    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
    print(tokenizer.decode(outputs[0]))
    
  4. Use Cloud GPUs
    For better performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

SmolLM2 is licensed under the Apache 2.0 License.

More Related APIs in Text Generation