Llama Deepsync 3 B

prithivMLmods

Introduction

The Llama-Deepsync-3B is a fine-tuned version of the Llama-3.2-3B-Instruct base model, optimized for text generation tasks. It is designed to handle complex queries requiring deep reasoning, logical structuring, and problem-solving, making it suitable for applications in education, programming, and creative writing. The model can generate structured outputs such as JSON and supports multilingual text generation in over 29 languages.

Architecture

Llama 3.2 employs an auto-regressive language model using an optimized transformer architecture. Fine-tuned versions utilize supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports long-contexts up to 128K tokens and can generate up to 8K tokens.

Training

The model has undergone extensive training to improve capabilities in coding, mathematics, instruction following, and generating long texts. It has been fine-tuned to understand and generate structured data, enhancing role-play and condition-setting for chatbots.

Guide: Running Locally

  1. Install Transformers: Ensure you have transformers >= 4.43.0 installed. Update it using:

    pip install --upgrade transformers
    
  2. Run with Transformers: Use the following Python code to run inference:

    import torch
    from transformers import pipeline
    
    model_id = "prithivMLmods/Llama-Deepsync-3B"
    pipe = pipeline(
        "text-generation",
        model=model_id,
        torch_dtype=torch.bfloat16,
        device_map="auto",
    )
    messages = [
        {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
        {"role": "user", "content": "Who are you?"},
    ]
    outputs = pipe(messages, max_new_tokens=256)
    print(outputs[0]["generated_text"][-1])
    

    For detailed recipes, refer to Hugging Face Llama Recipes.

  3. Run with Ollama:

    • Install Ollama: Download and install from ollama.com.
    • Create Model File: Create a file named after your model and specify the base model:
      FROM Llama-3.2-1B.F16.gguf
      
    • Create and Patch Model:
      ollama create metallama -f ./metallama
      ollama list
      
    • Run the Model:
      ollama run metallama
      
    • Interact with the Model:
      >>> Tell me about Space X.
      Space X, the private aerospace company founded by Elon Musk, is revolutionizing space exploration...
      

    Cloud GPUs: For optimal performance, consider using cloud GPU services like AWS, GCP, or Azure.

License

The model is licensed under creativeml-openrail-m, which governs its use and distribution.

More Related APIs in Text Generation