Smol L M2 Co T 360 M

prithivMLmods

Introduction

SmolLM2 is a family of compact language models designed to solve various tasks efficiently. Available in sizes of 135M, 360M, and 1.7B parameters, these models are lightweight enough to run on-device. The SmolLM2-CoT-360M model is particularly aimed at text generation and reasoning tasks.

Architecture

The SmolLM2-CoT-360M model utilizes a causal language model architecture optimized for text generation and reasoning tasks. It integrates transformers, safetensors, and other computational tools to achieve efficient performance.

Training

Fine-tuning SmolLM involves several key steps:

  1. Setting Up the Environment: Install necessary libraries such as transformers, datasets, trl, torch, and wandb.
  2. Loading Pre-trained Models: Use AutoModelForCausalLM and AutoTokenizer to load the model and tokenizer.
  3. Preparing the Dataset: Load and tokenize the dataset, such as Deepthink-Reasoning.
  4. Configuring Training Arguments: Set up parameters like batch size, learning rate, and device settings.
  5. Training: Use SFTTrainer to fine-tune the model.
  6. Saving the Model: Save the fine-tuned model and tokenizer for future use.

Guide: Running Locally

To run the SmolLM2-CoT-360M model locally, follow these steps:

  1. Install Libraries:

    pip install transformers datasets trl torch accelerate bitsandbytes wandb
    
  2. Load the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    model_name = "prithivMLmods/SmolLM2-CoT-360M"
    model = AutoModelForCausalLM.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
  3. Set Up Environment: Detect and utilize available hardware such as GPU.

    import torch
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)
    
  4. Run Inference:

    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50)
    print(tokenizer.decode(outputs[0]))
    

Cloud GPUs

For more efficient training and inference, consider using cloud-based GPU resources such as AWS, Google Cloud, or Azure.

License

The SmolLM2-CoT-360M model is licensed under the Apache 2.0 License, allowing for broad use and modification with attribution.

More Related APIs in Text Generation