Vikhr Llama3.1 8 B Instruct R 21 09 24

Vikhrmodels

Introduction

VIKHR-LLAMA3.1-8B-INSTRUCT-R-21-09-24 is a large language model (LLM) developed by VikhrModels. It is an enhanced version of the Meta-Llama-3.1-8B-Instruct model, optimized for Russian and English. The model is designed for various applications, including reasoning, summarization, coding, roleplay, and dialogue maintenance. It supports multilingual generation and high-performance RAG (Retrieval-Augmented Generation).

Architecture

The model features:

  • High-quality generation in Russian, English, and other languages.
  • Support for system prompts to regulate response style.
  • Context handling up to 128k tokens using RoPE scaling.
  • A Grounded RAG mode for document-based question answering.

Training

The training process involved several stages, including Supervised Fine-Tuning (SFT) and a custom alignment stage using SMPO (a variation of DPO). Key datasets used include:

  • Vikhrmodels/GrandMaster-PRO-MAX, featuring a large set of synthetic instructions with built-in Chain-Of-Thought.
  • Vikhrmodels/Grounded-RAG-RU-v2, designed for RAG grounding with complex dialogue structures.

The alignment stage involved custom Reward Model training, dataset filtering, Rejection Sampling, and continued training with SMPO to enhance response quality.

Guide: Running Locally

  1. Prerequisites: Ensure you have the necessary software installed, including Python, necessary libraries, and a compatible environment for running the model.
  2. Download the Model: Access the model from the Hugging Face repository.
  3. Serving the Model: Use vLLM to serve the model:
    vllm serve --dtype half --max-model-len 32000 -tp 1 Vikhrmodels/Vikhr-Llama3.1-8B-Instruct-R-21-09-24 --api-key token-abc123
    
  4. API Interaction: Set up a chat completion API to interact with the model using the provided system prompts and documents.
  5. Consider Cloud GPUs: For optimal performance, consider using cloud-based GPUs like those offered by AWS, Google Cloud, or Azure.

License

This model is released under the Apache-2.0 license, which permits use, distribution, and modification under certain conditions.

More Related APIs