Reasoning Smol L M2 135 M
prithivMLmodsIntroduction
SmolLM2 is a compact family of language models designed for efficient on-device execution, available in sizes of 135M, 360M, and 1.7B parameters. These models are versatile for various tasks, including reasoning and chat-based applications. Fine-tuning involves setting up the environment, training the model, and saving results.
Architecture
The models utilize the Hugging Face Transformers library, supporting causal language modeling tasks. They are designed to be lightweight, making them suitable for environments with limited computational resources.
Training
Training involves several steps:
- Setting Up the Environment: Install libraries like
transformers
,datasets
,torch
, and others. Import necessary modules and detect the available hardware (GPU, MPS, or CPU). - Load the Pre-trained Model and Tokenizer: Use Hugging Face's
AutoModelForCausalLM
andAutoTokenizer
to load pre-trained models. Set up a chat format for chat-based tasks. - Load and Prepare the Dataset: Load datasets using
load_dataset
and tokenize them using a custom function. - Configure Training Arguments: Set parameters like batch size, learning rate, and optimization settings using
TrainingArguments
. - Initialize the Trainer: Use
SFTTrainer
with the model, tokenizer, and training arguments. - Start Training: Execute the
train
method on the trainer. - Save the Fine-tuned Model: Save the model and tokenizer to a local directory for future use.
Guide: Running Locally
-
Install Required Libraries:
pip install transformers datasets trl torch accelerate bitsandbytes wandb
-
Load and Test the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "prithivMLmods/Reasoning-SmolLM2-135M" device = "cuda" # or "cpu" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
-
Run Inference:
messages = [{"role": "user", "content": "What is the capital of France."}] input_text = tokenizer.apply_chat_template(messages, tokenize=False) inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True) print(tokenizer.decode(outputs[0]))
-
Cloud GPUs: For better performance, consider using cloud-based GPUs from providers like AWS, Google Cloud, or Azure.
License
The project is licensed under the Apache-2.0 License, allowing for open-source use, distribution, and modification.