Socratic L M
CogBase-USTCIntroduction
SocraticLM is a fine-tuned model based on Qwen2.5-Math-7B-Instruct, designed for educational purposes, particularly in offering Socratic-style guidance for solving mathematical problems. The model can independently solve mathematical problems and supports both English and Chinese languages.
Architecture
SocraticLM is built using the transformers
library and leverages the Qwen2.5-Math-7B-Instruct model as its base. The model is intended for text generation tasks, focusing on educational scenarios.
Training
The model was trained using the following hyperparameters:
- Learning Rate: 1e-05
- Train Batch Size: 8
- Seed: 42
- Distributed Type: Multi-GPU
- Number of Devices: 4
- Total Train Batch Size: 32
- Total Eval Batch Size: 32
- Optimizer: ADAMW_TORCH with betas=(0.9, 0.999) and epsilon=1e-08
- Learning Rate Scheduler Type: Cosine
- Warmup Steps: 20
Framework versions used include Transformers 4.46.2, PyTorch 2.5.1+cu124, Datasets 3.1.0, and Tokenizers 0.20.3.
Guide: Running Locally
Basic Steps:
-
Install Required Libraries: Ensure you have the necessary Python packages installed:
pip install torch transformers
-
Load the Model: Use the following Python code to load the model and tokenizer:
import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CogBase-USTC/SocraticLM") model = AutoModelForCausalLM.from_pretrained( "CogBase-USTC/SocraticLM", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True )
-
Generate Text: Prepare your input messages and generate text:
messages = [ {"role": "system", "content" : "Please analyse and solve the following problem step by step."}, {"role": "user", "content": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer.encode(prompt, return_tensors="pt") outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=4096) print(tokenizer.decode(outputs[0]))
Cloud GPUs: For optimal performance, especially when handling large models or datasets, consider using cloud GPU services such as AWS, GCP, or Azure.
License
The model is distributed under a license categorized as "other." For specific licensing details, refer to the model's official repository or documentation.