Mixtral 8x7 B Instruct v0.1

mistralai

Introduction

Mixtral-8X7B-Instruct is a Large Language Model (LLM) developed by Mistral AI. It is a generative Sparse Mixture of Experts model that surpasses Llama 2 70B on various benchmarks. The model supports five languages: French, Italian, German, Spanish, and English.

Architecture

Mixtral-8X7B-Instruct utilizes a tokenization process with Mistral's custom tokenizer and supports different inference methods, including Hugging Face's Transformers library. It is designed for text generation tasks and can be fine-tuned using specific instruction formats to maintain optimal outputs.

Training

The model is pretrained and designed to be fine-tuned with an instruction format template that must be strictly followed. This involves using specific tokenization and instruction markers to ensure the generation of high-quality outputs. The model does not inherently include moderation mechanisms.

Guide: Running Locally

  1. Prerequisites:

    • Install the Hugging Face Transformers library.
    • Ensure access to a compatible GPU for inference.
  2. Loading the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch
    
    model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
    
  3. Generating Text:

    messages = [
        {"role": "user", "content": "What is your favourite condiment?"},
        {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice."},
        {"role": "user", "content": "Do you have mayonnaise recipes?"}
    ]
    
    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
    outputs = model.generate(input_ids, max_new_tokens=20)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    
  4. Optimizations:

    • Use float16 precision for reduced memory usage.
    • Consider using 8-bit or 4-bit precision with BitsAndBytes.
    • Enable Flash Attention 2 for possible performance benefits.
  5. Cloud GPUs:

    • Consider using cloud GPU services like AWS EC2, Google Cloud, or Azure for enhanced computational resources.

License

Mixtral-8X7B-Instruct is distributed under the Apache-2.0 license, allowing for wide usage and modification with appropriate credit.

More Related APIs in Text Generation