Hermes 3 Llama 3.1 70 B

NousResearch

Introduction

Hermes 3 is the latest version of the Hermes series of language models by Nous Research. It offers numerous improvements over its predecessor, Hermes 2, including enhanced reasoning, multi-turn conversation capabilities, and advanced roleplaying features.

Architecture

Hermes 3 is a generalist language model designed to align closely with user needs, providing advanced control and steering capabilities. It builds on the capabilities of Hermes 2 with improved function calling, structured output capabilities, and code generation skills.

Training

The model has been trained to excel in general capabilities, often outperforming Llama-3.1 Instruct models. It uses ChatML prompt formatting, which supports structured multi-turn chat dialogues, and is compatible with OpenAI endpoints. The model is also trained for function calling and structured output scenarios.

Guide: Running Locally

To run Hermes 3 locally:

  1. Install Dependencies: Ensure you have pytorch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn installed.
  2. Load the Model:
    import torch
    from transformers import AutoTokenizer, LlamaForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-3-Llama-3.1-70B', trust_remote_code=True)
    model = LlamaForCausalLM.from_pretrained(
        "NousResearch/Hermes-3-Llama-3.1-70B",
        torch_dtype=torch.float16,
        device_map="auto",
        load_in_8bit=False,
        load_in_4bit=True,
        use_flash_attention_2=True
    )
    
  3. Run Inference: Use the tokenizer and model to generate responses from input prompts.
  4. Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for access to powerful GPUs.

License

The model is released under the Llama3 license. Please refer to the license terms on the Hugging Face model card for more details.

More Related APIs in Text Generation