Mistral Nemo Instruct 2407 LLM Model

Introduction

The Mistral-Nemo-Instruct-2407 is a large language model (LLM) developed by Mistral AI in collaboration with NVIDIA. It is an instruct fine-tuned iteration of the Mistral-Nemo-Base-2407 and is designed to outperform models of similar size. This model is optimized for text generation and supports multiple languages.

Architecture

Mistral Nemo is a transformer-based model with the following specifications:

Layers: 40
Dimension: 5,120
Head dimension: 128
Hidden dimension: 14,336
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Vocabulary size: Approximately 128k
Rotary embeddings: Theta = 1M

Training

The model is trained with a 128k context window and utilizes a mix of multilingual and code data. It serves as a drop-in replacement for the Mistral 7B model. The benchmarks for the model indicate strong performance across various tasks, including HellaSwag, Winogrande, OpenBookQA, CommonSenseQA, and others.

Guide: Running Locally

Basic Steps

Install Mistral Inference:
```
pip install mistral_inference
```

Download Model:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-Instruct')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Mistral-Nemo-Instruct-2407", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)

Run Chat Interface:

mistral-chat $HOME/mistral_models/Nemo-Instruct --instruct --max_tokens 256 --temperature 0.35

Use Transformers (Optional):
For using Hugging Face transformers, install the library from source and follow the setup guidelines.

Cloud GPUs

For optimal performance, consider using cloud-based GPUs such as those offered by AWS, Google Cloud, or Azure.

License

The Mistral-Nemo-Instruct-2407 model is released under the Apache 2.0 License, allowing for free use and distribution with proper attribution.

More Related APIs in Text Generation