Smol L M2 Co T 360 M LLM Model

Introduction

SmolLM2 is a family of compact language models designed to solve various tasks efficiently. Available in sizes of 135M, 360M, and 1.7B parameters, these models are lightweight enough to run on-device. The SmolLM2-CoT-360M model is particularly aimed at text generation and reasoning tasks.

Architecture

The SmolLM2-CoT-360M model utilizes a causal language model architecture optimized for text generation and reasoning tasks. It integrates transformers, safetensors, and other computational tools to achieve efficient performance.

Training

Fine-tuning SmolLM involves several key steps:

Setting Up the Environment: Install necessary libraries such as transformers, datasets, trl, torch, and wandb.
Loading Pre-trained Models: Use AutoModelForCausalLM and AutoTokenizer to load the model and tokenizer.
Preparing the Dataset: Load and tokenize the dataset, such as Deepthink-Reasoning.
Configuring Training Arguments: Set up parameters like batch size, learning rate, and device settings.
Training: Use SFTTrainer to fine-tune the model.
Saving the Model: Save the fine-tuned model and tokenizer for future use.

Guide: Running Locally

To run the SmolLM2-CoT-360M model locally, follow these steps:

Install Libraries:

pip install transformers datasets trl torch accelerate bitsandbytes wandb

Load the Model:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "prithivMLmods/SmolLM2-CoT-360M"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Set Up Environment: Detect and utilize available hardware such as GPU.

import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

Run Inference:

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Cloud GPUs

For more efficient training and inference, consider using cloud-based GPU resources such as AWS, Google Cloud, or Azure.

License

The SmolLM2-CoT-360M model is licensed under the Apache 2.0 License, allowing for broad use and modification with attribution.

More Related APIs in Text Generation