Mixtral 8x22 B Instruct v0.1

mistralai

Mixtral-8x22B-Instruct-v0.1 Model Documentation

Introduction

Mixtral-8x22B-Instruct-v0.1 is a large language model (LLM) developed by Mistral AI. It is an instruct fine-tuned version of the Mixtral-8x22B-v0.1 model, designed for enhanced performance in text generation tasks across multiple languages, including English, Spanish, Italian, German, and French.

Architecture

The model employs transformers and is compatible with the Safetensors format. It has been optimized for various uses, including text generation and conversational applications, with inference endpoints available for deployment. The model leverages advanced protocols for encoding and decoding using Mistral's own tokenizers.

Training

Mixtral-8x22B-Instruct-v0.1 has been fine-tuned to handle complex instructions and generate text effectively. The training incorporates advanced tool use and function calling capabilities, enabling the model to perform tasks like weather retrieval through structured function calls.

Guide: Running Locally

  1. Setup Environment: Install the required packages using pip.

    pip install transformers torch mistral-common
    
  2. Load the Model: Utilize Hugging Face's transformers library.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch
    
    model_id = "mistralai/Mixtral-8x22B-Instruct-v0.1"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
    model.to("cuda")
    
  3. Prepare Inputs: Use the tokenizer to format and tokenize your inputs.

    chat = [{"role": "user", "content": "Explain Machine Learning to me in a nutshell."}]
    tokens = tokenizer.apply_chat_template(chat, return_dict=True, return_tensors="pt", add_generation_prompt=True)
    
  4. Generate Text: Run inference to generate responses.

    generated_ids = model.generate(**tokens, max_new_tokens=1000, do_sample=True)
    result = tokenizer.decode(generated_ids[0])
    print(result)
    

Cloud GPUs

For optimal performance, especially with large models like Mixtral-8x22B-Instruct-v0.1, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure. This ensures sufficient computational resources and reduces latency during inference.

License

Mixtral-8x22B-Instruct-v0.1 is licensed under the Apache-2.0 License, allowing for both personal and commercial use with minimal restrictions. For more details, refer to the license documentation.

More Related APIs in Text Generation