deepseek llm 67b chat

deepseek-ai

Introduction

DeepSeek LLM is an advanced language model with 67 billion parameters, trained on 2 trillion tokens in English and Chinese. The model, along with its 7B/67B Base and Chat variants, is open-sourced for research purposes.

Architecture

The deepseek-llm-67b-chat model is initialized from deepseek-llm-67b-base and further fine-tuned with additional instruction data to enhance its capabilities.

Training

The model was trained from scratch using a large dataset comprising English and Chinese text. This extensive training allows it to generate coherent and contextually relevant text.

Guide: Running Locally

To run the DeepSeek LLM model locally, follow these steps:

  1. Environment Setup: Ensure you have Python installed along with PyTorch.
  2. Install Transformers: Use the command pip install transformers to install the required library.
  3. Load the Model:
    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
    
    model_name = "deepseek-ai/deepseek-llm-67b-chat"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
    model.generation_config = GenerationConfig.from_pretrained(model_name)
    model.generation_config.pad_token_id = model.generation_config.eos_token_id
    
  4. Generate Text: Prepare input messages and generate responses.
    messages = [{"role": "user", "content": "Who are you?"}]
    input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
    outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
    result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
    print(result)
    

Suggested Cloud GPUs: Consider using cloud services like AWS, Google Cloud, or Azure, which offer GPU instances to efficiently run large models like DeepSeek LLM.

License

The code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License, which permits commercial use. For more details, refer to the LICENSE-MODEL.

More Related APIs in Text Generation