G W Q 9 B Preview

prithivMLmods

Introduction

GWQ-9B-PREVIEW is part of the Gemma with Questions family, developed using technology from Google's Gemini models. These text-to-text, decoder-only large language models are designed for various tasks, including question answering, summarization, and reasoning. The GWQ models are English-based and utilize open weights for both pre-trained and instruction-tuned variants, built on the Gemma2forCasualLM architecture.

Architecture

  1. Transformer-Based Design: Utilizes self-attention mechanisms for processing input and capturing contextual relationships.
  2. Lightweight and Efficient: Fewer parameters make it suitable for resource-constrained environments.
  3. Modular Layers: Consists of encoder and decoder layers adaptable for tasks like text generation and summarization.
  4. Attention Mechanisms: Multi-head self-attention enhances the handling of long-range dependencies.
  5. Pre-training and Fine-Tuning: Pre-trained on large corpora and fine-tuned for specific tasks to improve performance.
  6. Scalability: Can scale according to application needs, balancing performance and resource use.
  7. Open-Source and Customizable: Allows modifications and extensions for specific use cases.

Training

GWQ is fine-tuned on the Chain of Continuous Thought Synthetic Dataset, enhancing its capabilities in reasoning and problem-solving. The model's architecture supports both pre-training on broad datasets and fine-tuning for domain-specific tasks.

Guide: Running Locally

  1. Install Dependencies:
    pip install accelerate
    
  2. Load Model and Tokenizer:
    from transformers import AutoTokenizer, AutoModelForCausalLM
    
    tokenizer = AutoTokenizer.from_pretrained("prithivMLmods/GWQ-9B-Preview")
    model = AutoModelForCausalLM.from_pretrained(
        "prithivMLmods/GWQ-9B-Preview",
        device_map="auto",
        torch_dtype=torch.bfloat16,
    )
    
  3. Generate Text:
    input_text = "Write me a poem about Machine Learning."
    input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
    
    outputs = model.generate(**input_ids, max_new_tokens=32)
    print(tokenizer.decode(outputs[0]))
    
  4. Cloud GPU Suggestion: For optimal performance, consider using cloud GPUs from providers like AWS, Google Cloud, or Azure.

License

The model is licensed under the Gemma license.

More Related APIs in Text Generation