saiga_nemo_12b

IlyaGusev

Introduction

The SAIGA/MISTRALNEMO 12B is a Russian fine-tuned version of the Mistral Nemo model. It is designed to serve as an automatic Russian-speaking assistant, capable of engaging users in conversation and assisting them with various queries.

Architecture

SAIGA/MISTRALNEMO 12B is based on a modified version of the Mistral Nemo model. It employs a particular prompt format that places the system prompt at the beginning. The model utilizes versions of llama.cpp, and is accessible via Google Colab for experimentation.

Training

The model is trained using datasets such as IlyaGusev/saiga_scored and IlyaGusev/saiga_preferences. It has undergone several iterations (v1, v2, v3) with enhancements in the model and dataset configurations. Each version has its specific dataset and model configurations using SFT (Supervised Fine Tuning) and SimPO (Simple Prompt Optimization) strategies. The training process is documented with various Weights & Biases (wandb) runs, showcasing the experimental setups and outcomes.

Guide: Running Locally

To run the model locally, follow these steps:

  1. Setup Environment: Ensure you have Python and PyTorch installed. It is recommended to use a virtual environment.

  2. Install Transformers: Use pip install transformers to get the necessary library.

  3. Load Model and Tokenizer:

    import torch
    from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
    
    MODEL_NAME = "IlyaGusev/saiga_nemo_12b"
    
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        load_in_8bit=True,
        torch_dtype=torch.bfloat16,
        device_map="auto"
    )
    model.eval()
    
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
    
  4. Inference: Create prompts and generate outputs using the model's generation capabilities.

    inputs = ["Why is grass green?", "Write a long story mentioning: Tanya, ball"]
    for query in inputs:
        prompt = tokenizer.apply_chat_template([{
            "role": "user",
            "content": query
        }], tokenize=False, add_generation_prompt=True)
        data = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
        data = {k: v.to(model.device) for k, v in data.items()}
        data.pop("token_type_ids", None)
        output_ids = model.generate(**data, generation_config=generation_config)[0]
        output_ids = output_ids[len(data["input_ids"][0]):]
        output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
        print(query)
        print(output)
        print()
    
  5. Cloud GPUs: For better performance, consider using cloud GPU services such as Google Colab, AWS, or Azure.

License

The SAIGA/MISTRALNEMO 12B model is released under the Apache 2.0 license, which allows for both commercial and non-commercial use, modification, and distribution.

More Related APIs