Giga Chat 20 B A3 B instruct
ai-sageGigaChat-20B-A3B-Instruct
Introduction
GigaChat-20B-A3B-Instruct is a dialog model from the GigaChat family, based on GigaChat-20B-A3B-base. It supports contexts up to 131,000 tokens and is available in Russian and English. The model weights are offered in bf16 and int8 formats.
Architecture
This model belongs to the GigaChat family and is designed for dialog generation with support for extended context handling. The architecture is optimized for handling large-scale language tasks and is benchmarked across various tests including mathematical problem-solving and general knowledge evaluation.
Training
Training details and benchmarks indicate performance across several datasets like GSM8K, MATH, and MMLU in both English and Russian. The model demonstrates significant capabilities in instruction following and general knowledge tasks.
Guide: Running Locally
To run the GigaChat-20B-A3B-Instruct model locally, follow these steps:
-
Install Requirements:
- Ensure you have
transformers>=4.47
installed.
- Ensure you have
-
Example Usage with Transformers:
import torch from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig model_name = "ai-sage/GigaChat-20B-A3B-instruct" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto") model.generation_config = GenerationConfig.from_pretrained(model_name) messages = [{"role": "user", "content": "Докажи теорему о неподвижной точке"}] input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt") outputs = model.generate(input_tensor.to(model.device)) result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=False) print(result)
-
Example Usage with VLLM:
from transformers import AutoTokenizer from vllm import LLM, SamplingParams model_name = "ai-sage/GigaChat-20B-A3B-instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = LLM(model=model_name, trust_remote_code=True) sampling_params = SamplingParams(temperature=0.3, max_tokens=8192) messages_list = [ [{"role": "user", "content": "Докажи теорему о неподвижной точке"}], ] prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params) generated_text = [output.outputs[0].text for output in outputs] print(generated_text)
-
Cloud GPUs: Consider using cloud GPU services for optimal performance due to the model's large size and computational needs.
License
The GigaChat-20B-A3B-Instruct model is licensed under the MIT License, allowing for flexible use in both personal and commercial projects.