Mixtral 8x7 B v0.1
mistralaiMixtral-8x7B-v0.1
Introduction
The Mixtral-8x7B is a Large Language Model (LLM) developed as a generative Sparse Mixture of Experts. It surpasses the performance of Llama 2 70B on most benchmarks. The model is compatible with the Hugging Face Transformers library and vLLM serving. Note that it cannot currently be instantiated with Hugging Face (HF).
Architecture
Mixtral-8x7B is built as a Sparse Mixture of Experts model, designed to manage large-scale language tasks efficiently across multiple languages, including French, Italian, German, Spanish, and English.
Training
The model is pretrained and primarily intended for text generation tasks. It does not currently include moderation mechanisms.
Guide: Running Locally
To run Mixtral-8x7B locally, follow these steps:
-
Install the Transformers library:
pip install transformers
-
Load the model and tokenizer:
from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "mistralai/Mixtral-8x7B-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id)
-
Generate text:
text = "Hello my name is" inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-
Optimize for memory:
- Use half-precision (float16) mode on GPUs.
- Utilize lower precision with
bitsandbytes
for 8-bit or 4-bit precision. - Enable Flash Attention 2 for enhanced performance.
Cloud GPUs
For optimal performance, consider using cloud-based GPU services such as AWS, Google Cloud, or Azure that support high-performance computations.
License
Mixtral-8x7B is released under the Apache 2.0 License, allowing use, distribution, and modification in compliance with the license terms.