Qw Q 4 B Instruct
prithivMLmodsIntroduction
The QwQ-4B-Instruct is a fine-tuned language model designed for instruction-following tasks and reasoning. It is based on a quantized version of the Qwen2.5-7B model, optimized for faster inference and reduced memory usage while maintaining robust capabilities for complex tasks. This model excels in generating step-by-step solutions, creative content, and logical analyses, with advanced understanding of both structured and unstructured data.
Architecture
QwQ-4B-Instruct integrates sophisticated natural language processing capabilities and supports instruction following, long text generation (over 8K tokens), and structured data comprehension (e.g., tables, JSON). It offers long-context support for up to 128K tokens and can generate up to 8K tokens. Multilingual support is provided for over 29 languages, including Chinese, English, French, Spanish, and more.
Training
The model is fine-tuned using the QwQ-LCoT-7B-Instruct as the base model. The training dataset includes amphora/QwQ-LongCoT-130K. The model is optimized for enhanced performance in coding, mathematics, and diverse instruction-following tasks.
Guide: Running Locally
To run QwQ-4B-Instruct locally, you can use the following steps:
-
Load the Model and Tokenizer: Utilize the
transformers
library to load the model and tokenizer.from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/QwQ-4B-Instruct" model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_name)
-
Prepare Input and Generate Text: Use the tokenizer to prepare inputs and generate text.
prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate(**model_inputs, max_new_tokens=512) response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-
Cloud GPU Recommendation: For efficient processing, consider using cloud GPUs such as those provided by AWS, Google Cloud, or Azure.
License
The QwQ-4B-Instruct model is licensed under the Apache 2.0 License. This allows for both personal and commercial use, modification, and distribution, provided that attribution is given to the original creators.