Llama Deepsync 1 B G G U F

prithivMLmods

Introduction

The Llama-Deepsync-1B-GGUF is a fine-tuned version of the Llama-3.2-1B-Instruct base model, optimized for text generation tasks that involve deep reasoning and logical structuring. It is particularly effective for applications in education, programming, and creative writing, excelling in generating step-by-step solutions, creative content, and logical analyses. The model supports over 29 languages, including English, French, Spanish, and more.

Architecture

Llama 3.2 employs an auto-regressive language model with an optimized transformer architecture. It uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports long-context input and generation, handling up to 128K tokens in context and generating up to 8K tokens.

Training

The model is fine-tuned to enhance its capabilities in coding, mathematics, and instruction following. It is resilient to diverse prompts, improving role-play and condition-setting for chatbots. The model is integrated with advanced understanding for generating structured outputs like JSON and handling structured data such as tables.

Guide: Running Locally

  1. Install Transformers: Ensure you have transformers >= 4.43.0 by running pip install --upgrade transformers.
  2. Set Up Environment: Import necessary libraries and set up a pipeline for text generation.
    import torch
    from transformers import pipeline
    
    model_id = "prithivMLmods/Llama-Deepsync-1B"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )
    
  3. Run Inference: Use the pipeline to generate responses.
    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipe(messages, max_new_tokens=256)
    print(outputs[0]["generated_text"][-1])
    
  4. Ollama Setup:
    • Install Ollama from here.
    • Create a model file and specify the base model.
    • Use ollama create to set up the model and ollama run to start it.

Cloud GPUs

For enhanced performance, consider using cloud GPU services like AWS or Google Cloud's AI Platform.

License

The model is licensed under the CreativeML Open RAIL-M license.

More Related APIs in Text Generation