Llama Deepsync 1 B G G U F
prithivMLmodsIntroduction
The Llama-Deepsync-1B-GGUF is a fine-tuned version of the Llama-3.2-1B-Instruct base model, optimized for text generation tasks that involve deep reasoning and logical structuring. It is particularly effective for applications in education, programming, and creative writing, excelling in generating step-by-step solutions, creative content, and logical analyses. The model supports over 29 languages, including English, French, Spanish, and more.
Architecture
Llama 3.2 employs an auto-regressive language model with an optimized transformer architecture. It uses supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model supports long-context input and generation, handling up to 128K tokens in context and generating up to 8K tokens.
Training
The model is fine-tuned to enhance its capabilities in coding, mathematics, and instruction following. It is resilient to diverse prompts, improving role-play and condition-setting for chatbots. The model is integrated with advanced understanding for generating structured outputs like JSON and handling structured data such as tables.
Guide: Running Locally
- Install Transformers: Ensure you have
transformers >= 4.43.0
by runningpip install --upgrade transformers
. - Set Up Environment: Import necessary libraries and set up a pipeline for text generation.
import torch from transformers import pipeline model_id = "prithivMLmods/Llama-Deepsync-1B" pipe = pipeline( "text-generation", model=model_id, torch_dtype=torch.bfloat16, device_map="auto", )
- Run Inference: Use the pipeline to generate responses.
messages = [ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, {"role": "user", "content": "Who are you?"}, ] outputs = pipe(messages, max_new_tokens=256) print(outputs[0]["generated_text"][-1])
- Ollama Setup:
- Install Ollama from here.
- Create a model file and specify the base model.
- Use
ollama create
to set up the model andollama run
to start it.
Cloud GPUs
For enhanced performance, consider using cloud GPU services like AWS or Google Cloud's AI Platform.
License
The model is licensed under the CreativeML Open RAIL-M license.