GigaChat-20B-A3B-Instruct

Introduction

GigaChat-20B-A3B-Instruct is a dialog model from the GigaChat family, based on GigaChat-20B-A3B-base. It supports contexts up to 131,000 tokens and is available in Russian and English. The model weights are offered in bf16 and int8 formats.

Architecture

This model belongs to the GigaChat family and is designed for dialog generation with support for extended context handling. The architecture is optimized for handling large-scale language tasks and is benchmarked across various tests including mathematical problem-solving and general knowledge evaluation.

Training

Training details and benchmarks indicate performance across several datasets like GSM8K, MATH, and MMLU in both English and Russian. The model demonstrates significant capabilities in instruction following and general knowledge tasks.

Guide: Running Locally

To run the GigaChat-20B-A3B-Instruct model locally, follow these steps:

Install Requirements:
- Ensure you have transformers>=4.47 installed.

Example Usage with Transformers:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "ai-sage/GigaChat-20B-A3B-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)

messages = [{"role": "user", "content": "Докажи теорему о неподвижной точке"}]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device))

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False)
print(result)

Example Usage with VLLM:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "ai-sage/GigaChat-20B-A3B-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, trust_remote_code=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=8192)

messages_list = [
    [{"role": "user", "content": "Докажи теорему о неподвижной точке"}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)
generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

Cloud GPUs: Consider using cloud GPU services for optimal performance due to the model's large size and computational needs.

License

The GigaChat-20B-A3B-Instruct model is licensed under the MIT License, allowing for flexible use in both personal and commercial projects.