Tiny Llama 1.1 B Chat v1.0
TinyLlamaIntroduction
TinyLlama-1.1B-Chat-v1.0 is a conversational AI model based on the Llama architecture, optimized for efficient deployment in scenarios with limited computational resources. It is designed to generate human-like text in response to user prompts.
Architecture
TinyLlama-1.1B retains the architecture and tokenizer of Llama 2, offering compatibility with various open-source projects. It contains 1.1 billion parameters, making it compact and suitable for applications with constrained computational and memory capacities.
Training
The model is fine-tuned from the TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T checkpoint using Hugging Face's Zephyr training methodology. Initially trained on a version of the UltraChat dataset, the model was further refined using the UltraFeedback dataset, which includes 64,000 ranked prompts and model completions.
Guide: Running Locally
To run the TinyLlama-1.1B-Chat-v1.0 model locally, follow these steps:
-
Install Dependencies:
- Ensure you have
transformers
version 4.34 or higher. - Install necessary packages:
pip install git+https://github.com/huggingface/transformers.git pip install accelerate
- Ensure you have
-
Setup and Run:
- Use the following Python script to generate text:
import torch from transformers import pipeline pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto") messages = [ { "role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate", }, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, ] prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"])
- Use the following Python script to generate text:
-
Consider Cloud GPUs: To efficiently handle the model's computational demands, consider utilizing cloud-based GPU services such as AWS, Google Cloud, or Azure.
License
The TinyLlama-1.1B-Chat-v1.0 model is licensed under the Apache-2.0 license, which allows for broad usage, modification, and distribution of the software.