Mistral Nemo Base 2407

mistralai

Introduction

The Mistral-Nemo-Base-2407 is a Large Language Model (LLM) with 12 billion parameters, collaboratively developed by Mistral AI and NVIDIA. It is designed for generative text tasks and offers superior performance compared to similarly sized models. The model supports multiple languages and is open-source under the Apache 2.0 License.

Architecture

Mistral-Nemo-Base-2407 is a transformer model featuring:

  • 40 layers
  • Dimensionality of 5,120
  • Head dimension of 128
  • Hidden dimension of 14,436
  • SwiGLU activation function
  • 32 attention heads
  • 8 key-value heads (GQA)
  • Vocabulary size of approximately 128,000
  • Rotary embeddings with theta set at 1 million

Training

The model was trained with a 128k context window and includes a significant amount of multilingual and code data. It is intended as a drop-in replacement for the Mistral 7B model, offering enhanced performance.

Guide: Running Locally

Install Mistral Inference

  1. Install the mistral_inference package:
    pip install mistral_inference
    

Download Model

  1. Download the model using the huggingface_hub:
    from huggingface_hub import snapshot_download
    from pathlib import Path
    
    mistral_models_path = Path.home().joinpath('mistral_models', 'Nemo-v0.1')
    mistral_models_path.mkdir(parents=True, exist_ok=True)
    
    snapshot_download(
        repo_id="mistralai/Mistral-Nemo-Base-2407",
        allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
        local_dir=mistral_models_path
    )
    

Run a Demo

  1. Run a demo using the CLI command:
    mistral-demo $HOME/mistral_models/Nemo-v0.1
    

Alternative: Using Transformers

  1. Install the Hugging Face transformers library from source:

    pip install git+https://github.com/huggingface/transformers.git
    
  2. Run the following code to generate text:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_id = "mistralai/Mistral-Nemo-Base-2407"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    
    model = AutoModelForCausalLM.from_pretrained(model_id)
    inputs = tokenizer("Hello my name is", return_tensors="pt")
    
    outputs = model.generate(**inputs, max_new_tokens=20)
    print(tokenizer.decode(outputs[0], skip_special_tokens=True))
    

Cloud GPU Suggestion

For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The Mistral-Nemo-Base-2407 model is released under the Apache 2.0 License, allowing for wide use and distribution.

More Related APIs in Text Generation