Fast Llama 3.2 3 B Instruct

suayptalha

Introduction

FastLlama-3.2-3B-Instruct is a refined model based on Llama-3.2-3B-Instruct, optimized for performance in constrained environments, combining speed and accuracy. It is particularly fine-tuned for mathematical reasoning and problem-solving using the MetaMathQA-50k dataset.

Architecture

FastLlama-3.2-3B-Instruct is lightweight and fast, maintaining Llama-class capabilities with a smaller computational footprint. It is instruction-tuned, pre-trained on following tasks, making it robust in understanding and executing complex queries. The model supports multiple languages, including English, German, Spanish, French, Italian, Portuguese, Hindi, and Thai.

Training

The model was fine-tuned using the MetaMathQA-50k subset of the HuggingFaceTB/smoltalk dataset, focusing on mathematical reasoning, problem-solving, and logical inference. The training utilized a learning rate of 2e-4, ran for one epoch, and employed the AdamW optimizer within the Unsloth framework.

Guide: Running Locally

  1. Install Dependencies: Ensure you have PyTorch and the Transformers library installed.
    pip install torch transformers
    
  2. Load the Model:
    import torch
    from transformers import pipeline
    
    model_id = "suayptalha/FastLlama-3.2-3B-Instruct"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        device_map="auto",
    )
    
  3. Run Inference:
    messages = [
        {"role": "system", "content": "You are a friendly assistant named FastLlama."},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipe(
        messages,
        max_new_tokens=256,
    )
    print(outputs[0]["generated_text"][-1])
    
  4. Suggest Cloud GPUs: For optimal performance, especially with larger models, consider using cloud GPU services such as AWS, Google Cloud, or Azure.

License

This model is licensed under the Apache 2.0 License. See the LICENSE file for detailed terms.

More Related APIs in Text Generation