saiga_nemo_12b
IlyaGusevIntroduction
The SAIGA/MISTRALNEMO 12B is a Russian fine-tuned version of the Mistral Nemo model. It is designed to serve as an automatic Russian-speaking assistant, capable of engaging users in conversation and assisting them with various queries.
Architecture
SAIGA/MISTRALNEMO 12B is based on a modified version of the Mistral Nemo model. It employs a particular prompt format that places the system prompt at the beginning. The model utilizes versions of llama.cpp, and is accessible via Google Colab for experimentation.
Training
The model is trained using datasets such as IlyaGusev/saiga_scored
and IlyaGusev/saiga_preferences
. It has undergone several iterations (v1, v2, v3) with enhancements in the model and dataset configurations. Each version has its specific dataset and model configurations using SFT (Supervised Fine Tuning) and SimPO (Simple Prompt Optimization) strategies. The training process is documented with various Weights & Biases (wandb) runs, showcasing the experimental setups and outcomes.
Guide: Running Locally
To run the model locally, follow these steps:
-
Setup Environment: Ensure you have Python and PyTorch installed. It is recommended to use a virtual environment.
-
Install Transformers: Use
pip install transformers
to get the necessary library. -
Load Model and Tokenizer:
import torch from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig MODEL_NAME = "IlyaGusev/saiga_nemo_12b" model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, load_in_8bit=True, torch_dtype=torch.bfloat16, device_map="auto" ) model.eval() tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
-
Inference: Create prompts and generate outputs using the model's generation capabilities.
inputs = ["Why is grass green?", "Write a long story mentioning: Tanya, ball"] for query in inputs: prompt = tokenizer.apply_chat_template([{ "role": "user", "content": query }], tokenize=False, add_generation_prompt=True) data = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) data = {k: v.to(model.device) for k, v in data.items()} data.pop("token_type_ids", None) output_ids = model.generate(**data, generation_config=generation_config)[0] output_ids = output_ids[len(data["input_ids"][0]):] output = tokenizer.decode(output_ids, skip_special_tokens=True).strip() print(query) print(output) print()
-
Cloud GPUs: For better performance, consider using cloud GPU services such as Google Colab, AWS, or Azure.
License
The SAIGA/MISTRALNEMO 12B model is released under the Apache 2.0 license, which allows for both commercial and non-commercial use, modification, and distribution.