Mistral Nemo Base 2407
mistralaiIntroduction
The Mistral-Nemo-Base-2407 is a Large Language Model (LLM) with 12 billion parameters, collaboratively developed by Mistral AI and NVIDIA. It is designed for generative text tasks and offers superior performance compared to similarly sized models. The model supports multiple languages and is open-source under the Apache 2.0 License.
Architecture
Mistral-Nemo-Base-2407 is a transformer model featuring:
- 40 layers
- Dimensionality of 5,120
- Head dimension of 128
- Hidden dimension of 14,436
- SwiGLU activation function
- 32 attention heads
- 8 key-value heads (GQA)
- Vocabulary size of approximately 128,000
- Rotary embeddings with theta set at 1 million
Training
The model was trained with a 128k context window and includes a significant amount of multilingual and code data. It is intended as a drop-in replacement for the Mistral 7B model, offering enhanced performance.
Guide: Running Locally
Install Mistral Inference
- Install the
mistral_inference
package:pip install mistral_inference
Download Model
- Download the model using the
huggingface_hub
:from huggingface_hub import snapshot_download from pathlib import Path mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-v0.1') mistral_models_path.mkdir(parents=True, exist_ok=True) snapshot_download( repo_id="mistralai/Mistral-Nemo-Base-2407", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path )
Run a Demo
- Run a demo using the CLI command:
mistral-demo $HOME/mistral_models/Nemo-v0.1
Alternative: Using Transformers
-
Install the Hugging Face
transformers
library from source:pip install git+https://github.com/huggingface/transformers.git
-
Run the following code to generate text:
from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "mistralai/Mistral-Nemo-Base-2407" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) inputs = tokenizer("Hello my name is", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Cloud GPU Suggestion
For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.
License
The Mistral-Nemo-Base-2407 model is released under the Apache 2.0 License, allowing for wide use and distribution.