DeepSeek LLM

Introduction

DeepSeek LLM is an advanced language model with 7 billion parameters, trained from scratch on a comprehensive dataset of 2 trillion tokens in both English and Chinese. DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat models are open source to support research efforts within the community.

Architecture

The model, deepseek-llm-7b-base, features 7 billion parameters and utilizes Multi-Head Attention. It has been trained from scratch on 2 trillion tokens.

Training

DeepSeek LLM was developed using a vast dataset that includes 2 trillion tokens in both English and Chinese. The training process was conducted from scratch, ensuring a robust foundation for various applications.

Guide: Running Locally

To use DeepSeek LLM for text completion, follow these steps:

Setup Environment: Ensure Python and PyTorch are installed.
Install Transformers:
```
pip install transformers
```

Load and Use the Model:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/deepseek-llm-7b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Hardware Requirements: For optimal performance, using cloud GPUs like those provided by AWS, Google Cloud, or Azure is recommended.

License

The code repository is licensed under the MIT License. However, the use of DeepSeek LLM models is governed by a specific Model License, which supports commercial use. More details can be found in the LICENSE-MODEL.