Llama 3.1 8 B Open S F T
prithivMLmodsIntroduction
The Llama-3.1-8B-Open-SFT is a state-of-the-art text generation model fine-tuned from the meta-llama/Llama-3.1-8B-Instruct. It is designed to excel in tasks such as conversational interactions, question answering, and chain-of-thought reasoning, utilizing Supervised Fine-Tuning (SFT) for enhanced performance on context-sensitive and instruction-following tasks.
Architecture
The model is built on a scalable sharded architecture, distributing its 8 billion parameters over four shards. This design ensures efficient loading and deployment for large-scale applications. Key features include Chain-of-Thought (CoT) reasoning, conversational AI capabilities, and multi-purpose functionality supporting various NLP tasks such as summarization and text completion.
Training
- Base Model: meta-llama/Llama-3.1-8B
- Dataset: O1-OPEN/OpenO1-SFT, containing 77.7k samples focused on instruction-based and open-domain tasks.
- The model has been optimized for open-domain tasks through extensive supervised fine-tuning to enhance its performance across a variety of applications.
Guide: Running Locally
-
Environment Setup:
- Install the
transformers
library by Hugging Face. - Ensure you have Python and PyTorch installed.
- Install the
-
Loading the Model:
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/Llama-3.1-8B-Open-SFT" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name)
-
Inference Example:
prompt = "Explain the concept of gravity in a simple way suitable for a 10-year-old:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=150, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print("Model Output:", response)
-
Hardware Recommendations:
- Use high-performance GPUs with at least 16GB VRAM for full precision, or 8GB for quantized models.
- Consider cloud GPU services like AWS, GCP, or Azure for scalable resources.
-
Optimization Options:
- Utilize Safetensors for secure weight loading.
- Apply quantization or model parallelism to optimize for resource-constrained environments.
License
The Llama-3.1-8B-Open-SFT model is released under the CreativeML Open RAIL-M license, which allows for open and flexible use while ensuring ethical and responsible deployment.