deepseek math 7b instruct

deepseek-ai

Introduction

DeepSeekMath is a specialized model designed for mathematical problem-solving using natural language processing. It provides an interface for generating step-by-step solutions to mathematical queries, supporting both English and Chinese languages.

Architecture

DeepSeekMath is built on the Transformers library using PyTorch, leveraging the capabilities of large language models for text generation. The model utilizes the AutoTokenizer and AutoModelForCausalLM from Hugging Face's Transformers to handle input text and generate responses.

Training

The model is optimized for mathematical reasoning and can be prompted to perform calculations or solve problems with a chain-of-thought approach. This involves breaking down complex queries into manageable steps to derive the final solution.

Guide: Running Locally

To run DeepSeekMath locally, follow these steps:

  1. Install Dependencies: Ensure torch and transformers are installed in your Python environment.

    pip install torch transformers
    
  2. Load Model and Tokenizer:

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
    
    model_name = "deepseek-ai/deepseek-math-7b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
    model.generation_config = GenerationConfig.from_pretrained(model_name)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id
    
  3. Generate Responses:

    messages = [
        {"role": "user", "content": "what is the integral of x^2 from 0 to 2?\nPlease reason step by step, and put your final answer within \\boxed{}."}
    ]
    input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
    outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
    
    result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
    print(result)
    
  4. Cloud GPUs: For optimal performance, consider running the model on cloud GPU services such as AWS, Google Cloud, or Azure.

License

The code repository for DeepSeekMath is licensed under the MIT License. The use of DeepSeekMath models is subject to a separate Model License, which supports commercial use. Details are available in the LICENSE-MODEL.

More Related APIs in Text Generation