Smol L M2 Co T 360 M G G U F
prithivMLmodsIntroduction
SmolLM2 is a family of compact language models designed for efficiency and versatility, with parameter sizes of 135M, 360M, and 1.7B. These models are suitable for solving various tasks and are lightweight enough for on-device execution.
Architecture
The SmolLM2 models leverage a compact transformer architecture optimized for text generation and reasoning tasks. The specific variant discussed here is the SmolLM2-CoT-360M, which incorporates techniques for chain-of-thought reasoning.
Training
Training SmolLM2 models involves a structured process:
- Environment Setup: Install necessary Python libraries such as
transformers
,datasets
,trl
,torch
,accelerate
,bitsandbytes
, andwandb
. - Loading Pre-trained Models and Tokenizers: Use Hugging Face's
AutoModelForCausalLM
andAutoTokenizer
. - Dataset Preparation: Load and tokenize the
Deepthink-Reasoning
dataset using Hugging Face datasets. - Training Configuration: Define parameters like batch size and learning rate using
TrainingArguments
. - Model Training: Utilize
SFTTrainer
to fine-tune the model. - Model Saving: Save the fine-tuned model locally for future use.
Guide: Running Locally
- Install Required Libraries: Use pip to install the following packages:
pip install transformers datasets trl torch accelerate bitsandbytes wandb
- Load Model and Tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("prithivMLmods/SmolLM2-CoT-360M") tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/SmolLM2-CoT-360M")
- Run Inference:
device = "cuda" if torch.cuda.is_available() else "cpu" model.to(device) input_text = "What is the capital of France." inputs = tokenizer.encode(input_text, return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0]))
- Consider Cloud GPUs: For optimal performance, use cloud platforms like AWS, Google Cloud, or Azure, which offer GPU resources.
License
The SmolLM2-CoT-360M-GGUF model is licensed under the Apache-2.0 License, allowing for use, modification, and distribution with proper attribution and without warranty.