deepthought 8b llama v0.01 alpha
ruliadIntroduction
Deepthought-8B is a compact yet capable reasoning model based on LLaMA-3.1 8B. It is designed to enhance AI reasoning transparency and controllability, offering reasoning capabilities comparable to larger models.
Architecture
Deepthought-8B utilizes a structured approach to problem-solving, producing outputs that document the reasoning process in a JSON format. This makes the model's decision-making more understandable and verifiable. Key features include transparent reasoning, a programmable approach, test-time compute scaling, efficient performance on hardware with 16GB+ VRAM, and structured JSON output.
Training
The model is built to break down its reasoning into clear, documented steps, allowing customizable reasoning patterns without necessitating model retraining. It scales its reasoning depth according to task complexity, enabling efficient performance adjustments.
Guide: Running Locally
Basic Steps
-
Set Up Environment:
- Ensure Python 3.6+ is installed.
- Install necessary libraries:
pip install torch transformers
- Optionally, install Flash Attention 2 for enhanced performance:
pip install flash-attn
-
Configure Environment Variables:
- Set your Hugging Face token:
export HF_TOKEN=your_token_here export HF_HUB_ENABLE_HF_TRANSFER=1
- Set your Hugging Face token:
-
Initialize the Model in Python:
- Use the following code snippet:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "ruliad/deepthought-8b-llama-v0.01-alpha" tokenizer = AutoTokenizer.from_pretrained( model_name, add_bos_token=False, trust_remote_code=True, padding="left", torch_dtype=torch.bfloat16, ) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", attn_implementation="flash_attention_2", # Use "eager" if flash_attn is not installed use_cache=True, trust_remote_code=True, )
- Use the following code snippet:
-
Run Inference Example:
- Execute the script:
python deepthought_inference.py
- Execute the script:
Cloud GPUs
Consider using cloud services with GPUs that provide at least 16GB VRAM to efficiently run the model.
License
The Deepthought-8B model is available under a commercial license for enterprise use.