Hermes 3 Llama 3.1 70 B
NousResearchIntroduction
Hermes 3 is the latest version of the Hermes series of language models by Nous Research. It offers numerous improvements over its predecessor, Hermes 2, including enhanced reasoning, multi-turn conversation capabilities, and advanced roleplaying features.
Architecture
Hermes 3 is a generalist language model designed to align closely with user needs, providing advanced control and steering capabilities. It builds on the capabilities of Hermes 2 with improved function calling, structured output capabilities, and code generation skills.
Training
The model has been trained to excel in general capabilities, often outperforming Llama-3.1 Instruct models. It uses ChatML prompt formatting, which supports structured multi-turn chat dialogues, and is compatible with OpenAI endpoints. The model is also trained for function calling and structured output scenarios.
Guide: Running Locally
To run Hermes 3 locally:
- Install Dependencies: Ensure you have
pytorch
,transformers
,bitsandbytes
,sentencepiece
,protobuf
, andflash-attn
installed. - Load the Model:
import torch from transformers import AutoTokenizer, LlamaForCausalLM tokenizer = AutoTokenizer.from_pretrained('NousResearch/Hermes-3-Llama-3.1-70B', trust_remote_code=True) model = LlamaForCausalLM.from_pretrained( "NousResearch/Hermes-3-Llama-3.1-70B", torch_dtype=torch.float16, device_map="auto", load_in_8bit=False, load_in_4bit=True, use_flash_attention_2=True )
- Run Inference: Use the tokenizer and model to generate responses from input prompts.
- Cloud GPUs: Consider using cloud services like AWS, GCP, or Azure for access to powerful GPUs.
License
The model is released under the Llama3 license. Please refer to the license terms on the Hugging Face model card for more details.