Smol L M 135 M Instruct

HuggingFaceTB

Introduction

SmolLM is a series of compact language models available in three sizes: 135M, 360M, and 1.7B parameters. These models are trained on the SmolLM-Corpus, which is a curated collection of high-quality educational and synthetic data. SmolLM-Instruct models are fine-tuned on publicly available datasets to optimize performance for instructional tasks.

Architecture

SmolLM models are designed for efficiency and performance across different sizes. They utilize a fine-tuning process that involves datasets like WebInstructSub and StarCoder2-Self-OSS-Instruct. Version 0.2 of the models includes adjustments to fine-tuning datasets to improve response quality and topic adherence.

Training

The training process for SmolLM models involves using the alignment-handbook, with parameters like a learning rate of 1e-3, a cosine schedule, and a warmup ratio of 0.1. The models are trained for one epoch with a global batch size of 262k tokens. The training datasets include Magpie-Pro-300K-Filtered, OpenHermes-2.5, and others, aimed at enhancing conversational abilities and topic adherence.

Guide: Running Locally

To run SmolLM models locally:

  1. Install the Transformers library:

    pip install transformers
    
  2. Load the model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct"
    device = "cuda"  # Use "cpu" if GPU is unavailable
    
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
    
  3. Generate Text:

    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
    print(tokenizer.decode(outputs[0]))
    
  4. Optional: Use TRL CLI for terminal chat:

    pip install trl
    trl chat --model_name_or_path HuggingFaceTB/SmolLM-135M-Instruct --device cpu
    

For optimal performance, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.

License

SmolLM models are released under the Apache 2.0 license, which allows for both commercial and non-commercial use.

More Related APIs in Text Generation