Hermes 3 Llama 3.1 8 B

NousResearch

Introduction

Hermes 3 is the latest iteration in the Hermes series of language models developed by Nous Research. This model emphasizes enhancements in multi-turn conversation, roleplaying, reasoning, and long context coherence, offering users substantial control and steering capabilities. Hermes 3 surpasses Hermes 2 with improved function calling and structured output abilities.

Architecture

The Hermes 3 model employs a prompt format called ChatML, facilitating structured multi-turn dialogue. The system prompts allow for steerability, guiding the model's behavior, roles, and style. It is compatible with OpenAI's endpoint, making it familiar to users acquainted with the ChatGPT API.

Training

Hermes 3 was trained on a variety of tasks, focusing on aligning LLMs to user needs by allowing powerful steering and control. It includes capabilities for function calling, structured outputs, and generalist assistance, building on the foundations of Hermes 2.

Guide: Running Locally

  1. Setup: Install necessary packages such as torch, transformers, bitsandbytes, sentencepiece, protobuf, and flash-attn.
  2. Load Model: Use the Hugging Face transformers library to load the Hermes model.
    from transformers import AutoTokenizer, LlamaForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-3-Llama-3.1-8B', trust_remote_code=True)
    model = LlamaForCausalLM.from_pretrained(
        "NousResearch/Hermes-3-Llama-3.1-8B",
        torch_dtype=torch.float16,
        device_map="auto",
        load_in_8bit=False,
        load_in_4bit=True,
        use_flash_attention_2=True
    )
    
  3. Inference: Prepare prompts and generate responses using the model.
    prompts = ["<|im_start|>system...<|im_end|>"]
    input_ids = tokenizer(prompts, return_tensors="pt").input_ids.to("cuda")
    generated_ids = model.generate(input_ids, max_new_tokens=750, temperature=0.8)
    
  4. Cloud GPUs: Consider using cloud services such as AWS, GCP, or Azure for GPU resources to handle the computational demands of running large models like Hermes 3.

License

Hermes 3 is distributed under the llama3 license, which governs its use and distribution. Ensure compliance with the specific terms outlined in this license when utilizing the model.

More Related APIs in Text Generation