moxin chat 7b

moxin-org

MOXIN-CHAT-7B Model

Introduction

MOXIN-CHAT-7B is a conversational model developed by MOXIN-ORG, designed for generating human-like text responses. The model is based on a 7 billion parameter architecture and has been fine-tuned to enhance its conversational capabilities. More details and technical specifications can be found in the technical report.

Architecture

The MOXIN-CHAT-7B model is built using PyTorch and supports various features like inference endpoints and GGUF compatibility. It specializes in conversational tasks and is part of the Mistral model family.

Training

The model was initially trained on a diverse set of datasets to ensure robust language understanding and generation capabilities. It was further fine-tuned using the Tulu v2 dataset, enhancing its performance in conversational contexts. The training process involved testing on several benchmarks, such as ARC, Hellaswag, MMLU, and Winogrande, to optimize its performance.

Guide: Running Locally

To run the MOXIN-CHAT-7B model locally, follow these steps:

  1. Install Dependencies: Ensure you have Python and PyTorch installed. Install the transformers library from Hugging Face:

    pip install transformers torch
    
  2. Download the Model: Obtain the model from the Hugging Face model hub.

  3. Run Inference:

    import torch
    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    
    model_name = 'moxin-org/moxin-chat-7b'
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
    
    pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto")
    
    prompt = "Can you explain the concept of regularization in machine learning?"
    sequences = pipe(prompt, do_sample=True, max_new_tokens=1000, temperature=0.7, top_k=50, top_p=0.95, num_return_sequences=1)
    print(sequences[0]['generated_text'])
    
  4. Chat Template:

    messages = [
        {"role": "user", "content": "What is your favourite condiment?"},
        {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice..."},
        {"role": "user", "content": "Do you have mayonnaise recipes?"}
    ]
    encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
    model_inputs = encodeds.to("cuda")
    model.to("cuda")
    generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
    decoded = tokenizer.batch_decode(generated_ids)
    print(decoded[0])
    

It is recommended to use cloud-based GPUs such as AWS EC2, Google Cloud, or Azure for better performance and efficiency when running large models like MOXIN-CHAT-7B.

License

The MOXIN-CHAT-7B model is released under the Apache 2.0 License, allowing for both commercial and non-commercial use with attribution.

More Related APIs