Mistral 7 B Instruct v0.2 G G U F

TheBloke

Introduction

The Mistral-7B-Instruct-v0.2 model is a fine-tuned large language model designed to improve upon its predecessor, Mistral-7B-Instruct-v0.1. It is particularly suited for instruction-based tasks and leverages a unique prompt template to perform effectively in conversational contexts.

Architecture

The model is built upon the Mistral-7B-v0.1 architecture, featuring:

  • Grouped-Query Attention
  • Sliding-Window Attention
  • Byte-fallback BPE tokenizer

These architectural choices aim to enhance the model's ability to process instructions and generate coherent responses in a conversational format.

Training

The Mistral-7B-Instruct-v0.2 model is fine-tuned from its base version, focusing on improving performance in instruction-based tasks. The training process does not include moderation mechanisms, which is an aspect the developers are looking to improve through community engagement.

Guide: Running Locally

To run the Mistral-7B-Instruct-v0.2 model locally, follow these steps:

  1. Install Required Libraries:
    Ensure you have installed the transformers library. If using llama-cpp-python, install it via:

    pip install llama-cpp-python
    
  2. Download Model:
    Use the huggingface-cli to download the desired model file, for example:

    huggingface-cli download TheBloke/Mistral-7B-Instruct-v0.2-GGUF mistral-7b-instruct-v0.2.Q4_K_M.gguf --local-dir .
    
  3. Set Up Environment for GPU:
    If available, use GPU acceleration to enhance performance. Set up the environment variables for your specific GPU configuration, such as CUDA or ROCm.

  4. Run the Model:
    Use the llama-cpp-python library to load and run the model:

    from llama_cpp import Llama
    
    llm = Llama(model_path="./mistral-7b-instruct-v0.2.Q4_K_M.gguf", n_ctx=32768, n_threads=8, n_gpu_layers=35)
    output = llm("<s>[INST] Write a story about llamas. [/INST]", max_tokens=512)
    print(output)
    
  5. Cloud GPUs:
    If local resources are insufficient, consider using cloud-based GPUs for running the model efficiently, such as those provided by AWS, Google Cloud, or Azure.

License

The Mistral-7B-Instruct-v0.2 model is released under the Apache 2.0 license, allowing for both personal and commercial use with proper attribution.

More Related APIs in Text Generation