Tiny Llama 1.1 B Chat v1.0

TinyLlama

Introduction

TinyLlama-1.1B-Chat-v1.0 is a conversational AI model based on the Llama architecture, optimized for efficient deployment in scenarios with limited computational resources. It is designed to generate human-like text in response to user prompts.

Architecture

TinyLlama-1.1B retains the architecture and tokenizer of Llama 2, offering compatibility with various open-source projects. It contains 1.1 billion parameters, making it compact and suitable for applications with constrained computational and memory capacities.

Training

The model is fine-tuned from the TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T checkpoint using Hugging Face's Zephyr training methodology. Initially trained on a version of the UltraChat dataset, the model was further refined using the UltraFeedback dataset, which includes 64,000 ranked prompts and model completions.

Guide: Running Locally

To run the TinyLlama-1.1B-Chat-v1.0 model locally, follow these steps:

  1. Install Dependencies:

    • Ensure you have transformers version 4.34 or higher.
    • Install necessary packages:
      pip install git+https://github.com/huggingface/transformers.git
      pip install accelerate
      
  2. Setup and Run:

    • Use the following Python script to generate text:
      import torch
      from transformers import pipeline
      
      pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
      
      messages = [
          {
              "role": "system",
              "content": "You are a friendly chatbot who always responds in the style of a pirate",
          },
          {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
      ]
      prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
      outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
      print(outputs[0]["generated_text"])
      
  3. Consider Cloud GPUs: To efficiently handle the model's computational demands, consider utilizing cloud-based GPU services such as AWS, Google Cloud, or Azure.

License

The TinyLlama-1.1B-Chat-v1.0 model is licensed under the Apache-2.0 license, which allows for broad usage, modification, and distribution of the software.

More Related APIs in Text Generation