Reasoning Smol L M2 135 M

prithivMLmods

Introduction

SmolLM2 is a compact family of language models designed for efficient on-device execution, available in sizes of 135M, 360M, and 1.7B parameters. These models are versatile for various tasks, including reasoning and chat-based applications. Fine-tuning involves setting up the environment, training the model, and saving results.

Architecture

The models utilize the Hugging Face Transformers library, supporting causal language modeling tasks. They are designed to be lightweight, making them suitable for environments with limited computational resources.

Training

Training involves several steps:

  1. Setting Up the Environment: Install libraries like transformers, datasets, torch, and others. Import necessary modules and detect the available hardware (GPU, MPS, or CPU).
  2. Load the Pre-trained Model and Tokenizer: Use Hugging Face's AutoModelForCausalLM and AutoTokenizer to load pre-trained models. Set up a chat format for chat-based tasks.
  3. Load and Prepare the Dataset: Load datasets using load_dataset and tokenize them using a custom function.
  4. Configure Training Arguments: Set parameters like batch size, learning rate, and optimization settings using TrainingArguments.
  5. Initialize the Trainer: Use SFTTrainer with the model, tokenizer, and training arguments.
  6. Start Training: Execute the train method on the trainer.
  7. Save the Fine-tuned Model: Save the model and tokenizer to a local directory for future use.

Guide: Running Locally

  1. Install Required Libraries:

    pip install transformers datasets trl torch accelerate bitsandbytes wandb
    
  2. Load and Test the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    checkpoint = "prithivMLmods/Reasoning-SmolLM2-135M"
    device = "cuda"  # or "cpu"
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
    
  3. Run Inference:

    messages = [{"role": "user", "content": "What is the capital of France."}]
    input_text = tokenizer.apply_chat_template(messages, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
    print(tokenizer.decode(outputs[0]))
    
  4. Cloud GPUs: For better performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.

License

The project is licensed under the Apache-2.0 License, allowing for open-source use, distribution, and modification.

More Related APIs in Text Generation