Mixtral 8x7 B Instruct v0.1
mistralaiIntroduction
Mixtral-8X7B-Instruct is a Large Language Model (LLM) developed by Mistral AI. It is a generative Sparse Mixture of Experts model that surpasses Llama 2 70B on various benchmarks. The model supports five languages: French, Italian, German, Spanish, and English.
Architecture
Mixtral-8X7B-Instruct utilizes a tokenization process with Mistral's custom tokenizer and supports different inference methods, including Hugging Face's Transformers library. It is designed for text generation tasks and can be fine-tuned using specific instruction formats to maintain optimal outputs.
Training
The model is pretrained and designed to be fine-tuned with an instruction format template that must be strictly followed. This involves using specific tokenization and instruction markers to ensure the generation of high-quality outputs. The model does not inherently include moderation mechanisms.
Guide: Running Locally
-
Prerequisites:
- Install the Hugging Face Transformers library.
- Ensure access to a compatible GPU for inference.
-
Loading the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
-
Generating Text:
messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice."}, {"role": "user", "content": "Do you have mayonnaise recipes?"} ] input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") outputs = model.generate(input_ids, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Optimizations:
- Use float16 precision for reduced memory usage.
- Consider using 8-bit or 4-bit precision with BitsAndBytes.
- Enable Flash Attention 2 for possible performance benefits.
-
Cloud GPUs:
- Consider using cloud GPU services like AWS EC2, Google Cloud, or Azure for enhanced computational resources.
License
Mixtral-8X7B-Instruct is distributed under the Apache-2.0 license, allowing for wide usage and modification with appropriate credit.