dbrx instruct

databricks

Introduction

DBRX Instruct is a large language model (LLM) developed by Databricks, focusing on few-turn interactions. It is a mixture-of-experts (MoE) model, released alongside its base model, DBRX Base, under the Databricks Open Model License. This model is optimized for English-language text generation and code tasks.

Architecture

DBRX is a transformer-based, decoder-only LLM trained using next-token prediction. It employs a fine-grained MoE architecture with 132 billion total parameters, of which 36 billion are active per input. The model utilizes rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses a converted GPT-4 tokenizer and supports a context length of up to 32,768 tokens. The architecture is designed to improve model quality with a larger number of smaller experts, providing more combinations than other models like Mixtral-8x7B and Grok-1.

Training

DBRX was trained on 12 trillion tokens of curated text and code data, using Databricks tools for data processing and management. The training utilized curriculum learning to alter the data mix, enhancing model performance. The dataset is predominantly English, and the model was developed using the Composer, Streaming, Megablocks, and LLM Foundry libraries, supported by Databricks infrastructure.

Guide: Running Locally

To run the DBRX model locally:

  1. Install Required Packages:
    pip install "transformers>=4.40.0"
    pip install hf_transfer
    export HF_HUB_ENABLE_HF_TRANSFER=1
    
  2. Request Access: Obtain repository access and a Hugging Face access token with read permission.
  3. Load the Model:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
    model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")
    
  4. Run Inference:
    input_text = "What does it take to build a great LLM?"
    messages = [{"role": "user", "content": input_text}]
    input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
    
    outputs = model.generate(**input_ids, max_new_tokens=200)
    print(tokenizer.decode(outputs[0]))
    
  5. Cloud GPUs: Consider using cloud GPU services like AWS, Google Cloud, or Azure for optimal performance.

License

DBRX Instruct is released under the Databricks Open Model License. Usage must comply with this license and the Acceptable Use Policy.

More Related APIs in Text Generation