Mistral Nemo Instruct 2407

mistralai

Introduction

The Mistral-Nemo-Instruct-2407 is a large language model (LLM) developed by Mistral AI in collaboration with NVIDIA. It is an instruct fine-tuned iteration of the Mistral-Nemo-Base-2407 and is designed to outperform models of similar size. This model is optimized for text generation and supports multiple languages.

Architecture

Mistral Nemo is a transformer-based model with the following specifications:

  • Layers: 40
  • Dimension: 5,120
  • Head dimension: 128
  • Hidden dimension: 14,336
  • Activation Function: SwiGLU
  • Number of heads: 32
  • Number of kv-heads: 8 (GQA)
  • Vocabulary size: Approximately 128k
  • Rotary embeddings: Theta = 1M

Training

The model is trained with a 128k context window and utilizes a mix of multilingual and code data. It serves as a drop-in replacement for the Mistral 7B model. The benchmarks for the model indicate strong performance across various tasks, including HellaSwag, Winogrande, OpenBookQA, CommonSenseQA, and others.

Guide: Running Locally

Basic Steps

  1. Install Mistral Inference:

    pip install mistral_inference
    
  2. Download Model:

    from huggingface_hub import snapshot_download
    from pathlib import Path
    
    mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-Instruct')
    mistral_models_path.mkdir(parents=True, exist_ok=True)
    
    snapshot_download(repo_id="mistralai/Mistral-Nemo-Instruct-2407", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
    
  3. Run Chat Interface:

    mistral-chat $HOME/mistral_models/Nemo-Instruct --instruct --max_tokens 256 --temperature 0.35
    
  4. Use Transformers (Optional):
    For using Hugging Face transformers, install the library from source and follow the setup guidelines.

Cloud GPUs

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Mistral-Nemo-Instruct-2407 model is released under the Apache 2.0 License, allowing for free use and distribution with proper attribution.

More Related APIs in Text Generation