deepthought 8b llama v0.01 alpha

ruliad

Introduction

Deepthought-8B is a compact yet capable reasoning model based on LLaMA-3.1 8B. It is designed to enhance AI reasoning transparency and controllability, offering reasoning capabilities comparable to larger models.

Architecture

Deepthought-8B utilizes a structured approach to problem-solving, producing outputs that document the reasoning process in a JSON format. This makes the model's decision-making more understandable and verifiable. Key features include transparent reasoning, a programmable approach, test-time compute scaling, efficient performance on hardware with 16GB+ VRAM, and structured JSON output.

Training

The model is built to break down its reasoning into clear, documented steps, allowing customizable reasoning patterns without necessitating model retraining. It scales its reasoning depth according to task complexity, enabling efficient performance adjustments.

Guide: Running Locally

Basic Steps

  1. Set Up Environment:

    • Ensure Python 3.6+ is installed.
    • Install necessary libraries:
      pip install torch transformers
      
    • Optionally, install Flash Attention 2 for enhanced performance:
      pip install flash-attn
      
  2. Configure Environment Variables:

    • Set your Hugging Face token:
      export HF_TOKEN=your_token_here
      export HF_HUB_ENABLE_HF_TRANSFER=1
      
  3. Initialize the Model in Python:

    • Use the following code snippet:
      from transformers import AutoModelForCausalLM, AutoTokenizer
      import torch
      
      model_name = "ruliad/deepthought-8b-llama-v0.01-alpha"
      tokenizer = AutoTokenizer.from_pretrained(
          model_name,
          add_bos_token=False,
          trust_remote_code=True,
          padding="left",
          torch_dtype=torch.bfloat16,
      )
      
      model = AutoModelForCausalLM.from_pretrained(
          model_name,
          torch_dtype=torch.bfloat16,
          device_map="auto",
          attn_implementation="flash_attention_2",  # Use "eager" if flash_attn is not installed
          use_cache=True,
          trust_remote_code=True,
      )
      
  4. Run Inference Example:

    • Execute the script:
      python deepthought_inference.py
      

Cloud GPUs

Consider using cloud services with GPUs that provide at least 16GB VRAM to efficiently run the model.

License

The Deepthought-8B model is available under a commercial license for enterprise use.

More Related APIs in Text Generation