Llama 3.1 8 B Open S F T

prithivMLmods

Introduction

The Llama-3.1-8B-Open-SFT is a state-of-the-art text generation model fine-tuned from the meta-llama/Llama-3.1-8B-Instruct. It is designed to excel in tasks such as conversational interactions, question answering, and chain-of-thought reasoning, utilizing Supervised Fine-Tuning (SFT) for enhanced performance on context-sensitive and instruction-following tasks.

Architecture

The model is built on a scalable sharded architecture, distributing its 8 billion parameters over four shards. This design ensures efficient loading and deployment for large-scale applications. Key features include Chain-of-Thought (CoT) reasoning, conversational AI capabilities, and multi-purpose functionality supporting various NLP tasks such as summarization and text completion.

Training

  • Base Model: meta-llama/Llama-3.1-8B
  • Dataset: O1-OPEN/OpenO1-SFT, containing 77.7k samples focused on instruction-based and open-domain tasks.
  • The model has been optimized for open-domain tasks through extensive supervised fine-tuning to enhance its performance across a variety of applications.

Guide: Running Locally

  1. Environment Setup:

    • Install the transformers library by Hugging Face.
    • Ensure you have Python and PyTorch installed.
  2. Loading the Model:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    model_name = "prithivMLmods/Llama-3.1-8B-Open-SFT"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)
    
  3. Inference Example:

    prompt = "Explain the concept of gravity in a simple way suitable for a 10-year-old:"
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_length=150, temperature=0.7)
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print("Model Output:", response)
    
  4. Hardware Recommendations:

    • Use high-performance GPUs with at least 16GB VRAM for full precision, or 8GB for quantized models.
    • Consider cloud GPU services like AWS, GCP, or Azure for scalable resources.
  5. Optimization Options:

    • Utilize Safetensors for secure weight loading.
    • Apply quantization or model parallelism to optimize for resource-constrained environments.

License

The Llama-3.1-8B-Open-SFT model is released under the CreativeML Open RAIL-M license, which allows for open and flexible use while ensuring ethical and responsible deployment.

More Related APIs in Text Generation