Llama Deepsync 3 B G G U F
prithivMLmodsIntroduction
The Llama-Deepsync-3B-GGUF is a fine-tuned version of the Llama-3.2-3B-Instruct base model, optimized for text generation tasks needing deep reasoning and logical structuring. It is ideal for applications in education, programming, and creative writing, offering robust natural language processing capabilities and producing precise, contextually relevant outputs.
Architecture
Llama-Deepsync-3B uses an auto-regressive language model with an optimized transformer architecture. It employs supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. It supports long-context generation up to 128K tokens and can produce outputs in over 29 languages.
Training
The model is enhanced for complex tasks, such as coding and mathematics, through specialized expert models. It shows significant improvements in instruction following, generating long texts, understanding structured data, and producing structured outputs like JSON.
Guide: Running Locally
To run the model locally using Transformers:
-
Ensure Transformers Version: Update your Transformers library to version 4.43.0 or later using:
pip install --upgrade transformers
-
Set Up a Pipeline:
import torch from transformers import pipeline model_id = "prithivMLmods/Llama-Deepsync-3B" pipe = pipeline( "text-generation", model=model_id, torch_dtype=torch.bfloat16, device_map="auto", )
-
Run Inference:
messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipe(messages, max_new_tokens=256) print(outputs[0]["generated_text"][-1])
-
Explore More: For detailed usage recipes, visit Hugging Face Llama Recipes.
Cloud GPUs: Use cloud services like AWS, GCP, or Azure for GPU support to enhance performance.
License
The Llama-Deepsync-3B-GGUF is released under the CreativeML OpenRAIL-M license.