deepseek llm 7b base
deepseek-aiDeepSeek LLM
Introduction
DeepSeek LLM is an advanced language model with 7 billion parameters, trained from scratch on a comprehensive dataset of 2 trillion tokens in both English and Chinese. DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat models are open source to support research efforts within the community.
Architecture
The model, deepseek-llm-7b-base, features 7 billion parameters and utilizes Multi-Head Attention. It has been trained from scratch on 2 trillion tokens.
Training
DeepSeek LLM was developed using a vast dataset that includes 2 trillion tokens in both English and Chinese. The training process was conducted from scratch, ensuring a robust foundation for various applications.
Guide: Running Locally
To use DeepSeek LLM for text completion, follow these steps:
- Setup Environment: Ensure Python and PyTorch are installed.
- Install Transformers:
pip install transformers
- Load and Use the Model:
import torch from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig model_name = "deepseek-ai/deepseek-llm-7b-base" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") model.generation_config = GenerationConfig.from_pretrained(model_name) model.generation_config.pad_token_id = model.generation_config.eos_token_id text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs.to(model.device), max_new_tokens=100) result = tokenizer.decode(outputs[0], skip_special_tokens=True) print(result)
- Hardware Requirements: For optimal performance, using cloud GPUs like those provided by AWS, Google Cloud, or Azure is recommended.
License
The code repository is licensed under the MIT License. However, the use of DeepSeek LLM models is governed by a specific Model License, which supports commercial use. More details can be found in the LICENSE-MODEL.